Re: [PR] dissallow pushdown of volatile PhysicalExprs [datafusion]

via GitHub Wed, 23 Jul 2025 13:46:54 -0700


theirix commented on PR #16861:
URL: https://github.com/apache/datafusion/pull/16861#issuecomment-3110119344


   Thank you, @adriangb ! I can confirm that it works great with the table 
sampling, since I use `random` function (matched by name):
   ```
   query TT
   EXPLAIN SELECT COUNT(*) from t TABLESAMPLE 42 WHERE a < 10;
   ----
   logical_plan
   01)Projection: count(Int64(1)) AS count(*)
   02)--Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]]
   03)----Projection:
   04)------Filter: t.a < Int32(10) AND random() < Float64(0.42)
   05)--------TableScan: t projection=[a]
   physical_plan
   01)ProjectionExec: expr=[count(Int64(1))@0 as count(*)]
   02)--AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))]
   03)----CoalescePartitionsExec
   04)------AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]
   05)--------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
   06)----------ProjectionExec: expr=[]
   07)------------CoalesceBatchesExec: target_batch_size=8192
   08)--------------FilterExec: a@0 < 10 AND random() < 0.42
   09)----------------DataSourceExec: partitions=1, partition_sizes=[1]
   ```
   
   The volatile filter is not pushed to the datasource. Without this patch, it 
looked like `predicate=random() < 0.1`.
   
   I agree it'd be more scalable to have an abstract way to specify UDF 
volatility.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] dissallow pushdown of volatile PhysicalExprs [datafusion]

Reply via email to