gene-bordegaray commented on code in PR #20331:
URL: https://github.com/apache/datafusion/pull/20331#discussion_r2806125820


##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -46,6 +47,9 @@ impl FilterState {
     }
 }
 
+/// Per-partition filter expressions indexed by partition number.
+type PartitionedFilters = Vec<Option<Arc<dyn PhysicalExpr>>>;

Review Comment:
   Snapshotting happens before evaluation
   
   The full path in chronological order is:
   1. ParquetOpener::open() is called
   2. snapshot_physical_expr_for_partition(predicate, partition_index) is 
called -> important to note that we pass the index
   3. snapshot_physical_expr_for_partition replaces the 
DynamicFilterPhysicalExpr with the filter for the partition on that index (this 
is a physical expr)
   4. evaluate() is called which uses the snapshot expr (not the 
DynamicFilterPhysicalExpr) and we don't need to know the partition parameter 
because it was already dealt with earlier
   
   For the lit(true) concern, if `has_partitioned_filters()` returns false 
during snapshotting then we will fallback to lit(true) which yes then we won't 
evaluate anything. But this is ok behavior because its ok to do this rather 
than error
   
   The shouldn't happen because, the we wait until the build side is complete 
before we snapshot so we should always resolve to true.
   
   Maybe to be safe would be good to add a debug statement if 
`has_partitioned_filters()` returns false.
   
   Lmk if this makes sense 🙁 



##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -46,6 +47,9 @@ impl FilterState {
     }
 }
 
+/// Per-partition filter expressions indexed by partition number.
+type PartitionedFilters = Vec<Option<Arc<dyn PhysicalExpr>>>;

Review Comment:
   Snapshotting happens before evaluation
   
   The full path in chronological order is:
   1. ParquetOpener::open() is called
   2. snapshot_physical_expr_for_partition(predicate, partition_index) is 
called -> important to note that we pass the index
   3. snapshot_physical_expr_for_partition replaces the 
DynamicFilterPhysicalExpr with the filter for the partition on that index (this 
is a physical expr)
   4. evaluate() is called which uses the snapshot expr (not the 
DynamicFilterPhysicalExpr) and we don't need to know the partition parameter 
because it was already dealt with earlier
   
   For the lit(true) concern, if `has_partitioned_filters()` returns false 
during snapshotting then we will fallback to lit(true) which yes then we won't 
evaluate anything. But this is ok behavior because its ok to do this rather 
than error
   
   The shouldn't happen because, the we wait until the build side is complete 
before we snapshot so we should always resolve to true.
   
   Maybe to be safe would be good to add a debug statement if 
`has_partitioned_filters()` returns false.
   
   Lmk if this makes sense 🙂 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to