toutane commented on code in PR #2298:
URL: https://github.com/apache/iceberg-rust/pull/2298#discussion_r3413787482


##########
crates/iceberg/src/scan/mod.rs:
##########
@@ -433,6 +433,21 @@ impl TableScan {
 
     /// Returns an [`ArrowRecordBatchStream`].
     pub async fn to_arrow(&self) -> Result<ArrowRecordBatchStream> {
+        self.to_arrow_from_tasks(self.plan_files().await?)
+    }
+
+    /// Like [`TableScan::to_arrow`], but accepts a caller-supplied
+    /// [`FileScanTask`] stream instead of running [`TableScan::plan_files`]
+    /// internally.
+    ///
+    /// # Correctness
+    ///
+    /// Tasks must come from a [`TableScan`] with the same projection and
+    /// filter as `self`: predicates are baked into each task at planning
+    /// time and are not re-applied here. Reader-side configuration
+    /// (concurrency, batch size, row-group filtering, row selection) is
+    /// taken from `self` and may differ from the planning scan.
+    pub fn to_arrow_from_tasks(&self, tasks: FileScanTaskStream) -> 
Result<ArrowRecordBatchStream> {

Review Comment:
   I think your comment makes it clear that we don't need this function at all. 
As you pointed out, we can read the tasks directly from `ArrowReaderBuilder` 
without going through a `TableScan`. That means the only reason for 
`IcebergTableScan` to carry a `Table` (other than to obtain a `FileIO` 
instance) would be the "planning at execution" path.
   
   You mentioned that the datafusion-comet implementation has already chosen to 
drop the `Table` from the scanning node, since the tasks are pre-planned. With 
that in mind, I'm wondering whether the same approach might be desirable for 
iceberg-datafusion, which would let us remove its "planning at execution" path. 
What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to