Re: [PR] feat: support file-level parquet row selections [datafusion]

via GitHub Wed, 17 Jun 2026 20:27:38 -0700


haohuaijin commented on code in PR #22940:
URL: https://github.com/apache/datafusion/pull/22940#discussion_r3432780300



##########
datafusion/datasource-parquet/src/access_plan.rs:
##########
@@ -169,6 +204,110 @@ impl ParquetAccessPlan {
         }
     }
 
+    /// Create a new `ParquetAccessPlan` from a file-level [`RowSelection`].
+    ///
+    /// The selection is interpreted across all rows in the file, in row group
+    /// order, and is split into row-group level access using 
`row_group_meta_data`.
+    /// Fully skipped row groups become [`RowGroupAccess::Skip`], fully 
selected
+    /// row groups become [`RowGroupAccess::Scan`], and partially selected row
+    /// groups become [`RowGroupAccess::Selection`].
+    ///
+    /// # Errors
+    ///
+    /// Returns an error if the selection does not specify exactly the same
+    /// number of rows as the file metadata.
+    pub fn try_new_from_overall_row_selection(
+        selection: RowSelection,
+        row_group_meta_data: &[RowGroupMetaData],
+    ) -> Result<Self> {
+        let selectors: Vec<RowSelector> = selection.into();

Review Comment:
   Thanks for you detail suggestion. The main reason I wrote it this way is 
that using `split_off` will allocate new memory and also traverse the row 
groups two more time(`split_off` one time, `row_count` one time, 
`skipped_row_count` one time). I can write a benchmark later to see how much 
impact this has.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: support file-level parquet row selections [datafusion]

Reply via email to