2010YOUY01 commented on issue #18856:
URL: https://github.com/apache/datafusion/issues/18856#issuecomment-3587705416

   > I think the _for stats pruning only_ is perhaps not the right distinction 
to make: what can and can't be used for stats pruning is going to vary by file 
format and changes over time. The distinction that the logical filter pushdown 
makes, and that maybe we should be making here, is:
   > 
   > * I am going to use the filter, but I can't guarantee exact filtering aka 
`Inexact`. This usually means it _might_ be used for stats pruning but _will 
not_ be used for row-level pruning.
   > * I am going to apply the filter exactly as `FilterExec` would i.e. 
`Exact`.
   > * I won't use the filter at all i.e. `Unsupported`.
   > 
   > But maybe I'm missing something... what would a node do with the 
information that another operator is going to use a filter "only for stats 
pruning"? Not produce the filter?
   
   This extra 'stat-only' message is for the forward pass like `HashJoinExec 
--(push down only for stat pruning, not row-by-row filtering)--> ParquetExec`, 
and the backward pass `ParquetExec->HashJoinExec` should implement something 
like `exact/inexact/no`
   
   Here is a example that this extra forward pass info helps:
   ```
   select *
   from
   locations join spatial_objects
   on distance(locations.loc, spatial_objects.loc) < 10m;
   ```
   We can calculate a spatial range from the build side and push that to the 
probe side, and use stat pruning to eliminate some data. This is not worth to 
do row-level filtering at the scanner, because spatial calculation is very 
expensive to perform row-wise, it can be wasteful to filter in the scan once, 
and evaluate again during join probing -- only do so once in the join is better.
   
   This idea is a bit ahead of where we are though, so we should better 
implement it only when needed, not proactively right now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to