gene-bordegaray commented on code in PR #20331:
URL: https://github.com/apache/datafusion/pull/20331#discussion_r2806125820
##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -46,6 +47,9 @@ impl FilterState {
}
}
+/// Per-partition filter expressions indexed by partition number.
+type PartitionedFilters = Vec<Option<Arc<dyn PhysicalExpr>>>;
Review Comment:
Snapshotting happens before evaluation
The full path in chronological order is:
1. ParquetOpener::open() is called
2. snapshot_physical_expr_for_partition(predicate, partition_index) is
called -> important to note that we pass the index
3. snapshot_physical_expr_for_partition replaces the
DynamicFilterPhysicalExpr with the filter for the partition on that index (this
is a physical expr)
4. evaluate() is called which uses the snapshot expr (not the
DynamicFilterPhysicalExpr) and we don't need to know the partition parameter
because it was already dealt with earlier
For the lit(true) concern, if `has_partitioned_filters()` returns false
during snapshotting then we will fallback to lit(true) which yes then we won't
evaluate anything. But this is ok behavior because its ok to do this rather
than error
The shouldn't happen because, the we wait until the build side is complete
before we snapshot so we should always resolve to true.
Maybe to be safe would be good to add a debug statement if
`has_partitioned_filters()` returns false.
Lmk if this makes sense 🙁
##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -46,6 +47,9 @@ impl FilterState {
}
}
+/// Per-partition filter expressions indexed by partition number.
+type PartitionedFilters = Vec<Option<Arc<dyn PhysicalExpr>>>;
Review Comment:
Snapshotting happens before evaluation
The full path in chronological order is:
1. ParquetOpener::open() is called
2. snapshot_physical_expr_for_partition(predicate, partition_index) is
called -> important to note that we pass the index
3. snapshot_physical_expr_for_partition replaces the
DynamicFilterPhysicalExpr with the filter for the partition on that index (this
is a physical expr)
4. evaluate() is called which uses the snapshot expr (not the
DynamicFilterPhysicalExpr) and we don't need to know the partition parameter
because it was already dealt with earlier
For the lit(true) concern, if `has_partitioned_filters()` returns false
during snapshotting then we will fallback to lit(true) which yes then we won't
evaluate anything. But this is ok behavior because its ok to do this rather
than error
The shouldn't happen because, the we wait until the build side is complete
before we snapshot so we should always resolve to true.
Maybe to be safe would be good to add a debug statement if
`has_partitioned_filters()` returns false.
Lmk if this makes sense 🙂
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]