friendlymatthew commented on code in PR #20854:
URL: https://github.com/apache/datafusion/pull/20854#discussion_r2926036684
##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -478,8 +505,24 @@ impl TreeNodeVisitor<'_> for PushdownChecker<'_> {
struct PushdownColumns {
/// Sorted, unique column indices into the file schema required to evaluate
/// the filter expression. Must be in ascending order for correct schema
- /// projection matching.
+ /// projection matching. Does not include struct columns accessed via
`get_field`.
required_columns: Vec<usize>,
+ /// Struct field accesses via `get_field`. Each entry records the root
struct
+ /// column index and the field path being accessed.
+ struct_field_accesses: Vec<StructFieldAccess>,
+}
+
+/// Records a struct field access via `get_field(struct_col, 'field1',
'field2', ...)`.
+///
+/// This allows the row filter to project only the specific Parquet leaf
columns
+/// needed by the filter, rather than all leaves of the struct.
+#[derive(Debug, Clone)]
+struct StructFieldAccess {
+ /// Arrow root column index of the struct in the file schema.
+ root_index: usize,
+ /// Field names forming the path into the struct.
+ /// e.g., `["value"]` for `s['value']`, `["outer", "inner"]` for
`s['outer']['inner']`.
Review Comment:
We do support it! Here's a test that repros:
https://github.com/apache/datafusion/pull/20854/changes/44a02f3ed91d042e755fec5b0267d34238646aed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]