adriangb commented on code in PR #20913:
URL: https://github.com/apache/datafusion/pull/20913#discussion_r2940654587


##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -191,22 +191,22 @@ pub(crate) struct FilterCandidate {
     /// the filter and to order the filters when `reorder_predicates` is true.
     /// This is generated by summing the compressed size of all columns that 
the filter references.
     required_bytes: usize,
-    /// Column indices into the parquet file schema required to evaluate this 
filter.
-    projection: LeafProjection,
-    /// The Arrow schema containing only the columns required by this filter,
-    /// projected from the file's Arrow schema.
-    filter_schema: SchemaRef,
+    /// The resolved Parquet read plan (leaf indices + projected schema).
+    read_plan: ParquetReadPlan,
 }
 
-/// Projection specification for nested columns using Parquet leaf column 
indices.
+/// The result of resolving which Parquet leaf columns and Arrow schema fields
+/// are needed to evaluate an expression against a Parquet file
 ///
-/// For nested types like List and Struct, Parquet stores data in leaf columns
-/// (the primitive fields). This struct tracks which leaf columns are needed
-/// to evaluate a filter expression.
+/// This is the shared output of the column resolution pipeline used by both
+/// the row filter to build `ArrowPredicate`s and the opener to build 
`ProjectionMask`s
 #[derive(Debug, Clone)]
-struct LeafProjection {
-    /// Leaf column indices in the Parquet schema descriptor.
-    leaf_indices: Vec<usize>,
+pub(crate) struct ParquetReadPlan {
+    /// Leaf column indices in the Parquet schema descriptor to decode
+    pub leaf_indices: Vec<usize>,

Review Comment:
   If we have the leaf indices here, should we just go ahead and generate the 
`ProjectionMask`? There's been bugs before because we used roots instead of 
leaves and that sort of thing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to