friendlymatthew commented on code in PR #20854:
URL: https://github.com/apache/datafusion/pull/20854#discussion_r2914333467


##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -251,15 +251,26 @@ impl FilterCandidateBuilder {
             return Ok(None);
         };
 
+        let schema_descr = metadata.file_metadata().schema_descr();
         let root_indices: Vec<_> =
             required_columns.required_columns.into_iter().collect();
 
-        let leaf_indices = leaf_indices_for_roots(
-            &root_indices,
-            metadata.file_metadata().schema_descr(),
+        let mut leaf_indices = leaf_indices_for_roots(&root_indices, 
schema_descr);
+
+        let struct_leaf_indices = resolve_struct_field_leaves(
+            &required_columns.struct_field_accesses,
+            &self.file_schema,
+            schema_descr,
         );
+        leaf_indices.extend_from_slice(&struct_leaf_indices);
+        leaf_indices.sort_unstable();

Review Comment:
   `leaf_indices` and `root_indices` serve different purposes. leaf indices 
become the ProjectionMask, telling the parquet decoder which physical leaf 
columns to read from disk. root indicies (+ struct field accesses) become the 
filter schema, telling Arrow what schema to use when reconstructing the record 
batch
   
   Arrow just takes whatever decoded leaves are available and assembles them 
into the schema it was given. So suppose you had leaf_indices=[2] with 
root_indices[1]. The masks says decode leaf 2 and the schema says give me 
struct column 1, pruned to just this specific field



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to