adriangb commented on code in PR #20822:
URL: https://github.com/apache/datafusion/pull/20822#discussion_r2906480912
##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -368,6 +399,31 @@ impl TreeNodeVisitor<'_> for PushdownChecker<'_> {
type Node = Arc<dyn PhysicalExpr>;
fn f_down(&mut self, node: &Self::Node) -> Result<TreeNodeRecursion> {
+ // check for get_field(Column("foo"), "bar", ...) accessing a struct
field
+ // the resolved return type tells us the leaf type
+ // if it's non0nested, we can pushdown by recording the root column
index and skip children
+ // this way, the visitor never sees the raw struct Column and rejects
it
+ if let Some(func) =
+
ScalarFunctionExpr::try_downcast_func::<GetFieldFunc>(node.as_ref())
+ {
+ let args = func.args();
+
+ if let Some(column) = args
+ .first()
+ .and_then(|a| a.as_any().downcast_ref::<Column>())
+ {
+ let return_type = func.return_type();
+ if !DataType::is_nested(return_type) {
+ if let Some(recursion) =
self.check_struct_field_column(column.name())
Review Comment:
Could you add some comments explaining how this traversal works? e.g. for
`get_field(col, 'a', 'b')` how do we recurse, how does that relate to the
`self.file_schema.index_of(column_name)` call in `check_struct_field_column`
(maybe `check_struct_field_column` can include example inputs / outputs in it's
docstring), etc.
##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -60,9 +60,11 @@
//! still be sorted by size.
//!
//! List-aware predicates (for example, `array_has`, `array_has_all`, and
-//! `array_has_any`) can be evaluated directly during Parquet decoding. Struct
-//! columns and other nested projections that are not explicitly supported will
-//! continue to be evaluated after the batches are materialized.
+//! `array_has_any`) can be evaluated directly during Parquet decoding.
+//! Struct field access via `get_field` (e.g. `WHERE s['value'] > 5`) is also
+//! supported when the accessed leaf is a primitive type. Direct references to
+//! whole struct columns and other unsupported nested projections will continue
+//! to be evaluated after the batches are materialized.
Review Comment:
What would these unsupported examples look like?
##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -368,6 +395,31 @@ impl TreeNodeVisitor<'_> for PushdownChecker<'_> {
type Node = Arc<dyn PhysicalExpr>;
fn f_down(&mut self, node: &Self::Node) -> Result<TreeNodeRecursion> {
+ // check for get_field(Column("foo"), "bar", ...) accessing a struct
field
+ // the resolved return type tells us the leaf type
+ // if it's non0nested, we can pushdown by recording the root column
index and skip children
Review Comment:
```suggestion
// if it's non-nested, we can push down by recording the root column
index and skip children
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]