adriangb commented on PR #17197:
URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3196639491

   Ah I see I was looking at the wrong `20480`.
   
   I wonder if this is just a consequence of synchronization: there are no 
guarantees about the order in which the build side hash tables are built, and 
we don't push down the filter until they're all done, so e.g. one partition may 
stream several batches before the filter even kicks in, and that's going to be 
non-deterministic. I think it's a non issue in the real world because small 
queries are already going to be fast and large queries will stream enough 
batches that the filter will eventually kick in and be pushed down.
   
   But I need to confirm this hypothesis.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to