theirix commented on PR #16861:
URL: https://github.com/apache/datafusion/pull/16861#issuecomment-3110119344

   Thank you, @adriangb ! I can confirm that it works great with the table 
sampling, since I use `random` function (matched by name):
   ```
   query TT
   EXPLAIN SELECT COUNT(*) from t TABLESAMPLE 42 WHERE a < 10;
   ----
   logical_plan
   01)Projection: count(Int64(1)) AS count(*)
   02)--Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]]
   03)----Projection:
   04)------Filter: t.a < Int32(10) AND random() < Float64(0.42)
   05)--------TableScan: t projection=[a]
   physical_plan
   01)ProjectionExec: expr=[count(Int64(1))@0 as count(*)]
   02)--AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))]
   03)----CoalescePartitionsExec
   04)------AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]
   05)--------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
   06)----------ProjectionExec: expr=[]
   07)------------CoalesceBatchesExec: target_batch_size=8192
   08)--------------FilterExec: a@0 < 10 AND random() < 0.42
   09)----------------DataSourceExec: partitions=1, partition_sizes=[1]
   ```
   
   The volatile filter is not pushed to the datasource. Without this patch, it 
looked like `predicate=random() < 0.1`.
   
   I agree it'd be more scalable to have an abstract way to specify UDF 
volatility.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to