cetra3 commented on code in PR #20822:
URL: https://github.com/apache/datafusion/pull/20822#discussion_r2908036318
##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -294,6 +308,42 @@ impl<'schema> PushdownChecker<'schema> {
}
}
+ /// Checks whether a struct's root column exists in the file schema and,
if so,
+ /// records its index so the entire struct is decoded for filter
evaluation.
+ ///
+ /// This is called when we see a `get_field` expression that resolves to a
+ /// primitive leaf type. We only need the *root* column index because the
+ /// Parquet reader decodes all leaves of a struct together.
+ ///
+ /// # Example
+ ///
+ /// Given file schema `{a: Int32, s: Struct(foo: Utf8, bar: Int64)}` and
the
+ /// expression `get_field(s, 'foo') = 'hello'`:
+ ///
+ /// - `column_name` = `"s"` (the root struct column)
+ /// - `file_schema.index_of("s")` returns `1`
+ /// - We push `1` into `required_columns`
+ /// - Return `None` (no issue — traversal continues in the caller)
+ ///
+ /// If `"s"` is not in the file schema (e.g. a projected-away column), we
set
+ /// `projected_columns = true` and return `Jump` to skip the subtree.
+ fn check_struct_field_column(
Review Comment:
This feels a little weird to me, I'm wondering if there is a way to combine
it with `check_single_column`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]