adriangb commented on PR #17197: URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3196639491
Ah I see I was looking at the wrong `20480`. I wonder if this is just a consequence of synchronization: there are no guarantees about the order in which the build side hash tables are built, and we don't push down the filter until they're all done, so e.g. one partition may stream several batches before the filter even kicks in, and that's going to be non-deterministic. I think it's a non issue in the real world because small queries are already going to be fast and large queries will stream enough batches that the filter will eventually kick in and be pushed down. But I need to confirm this hypothesis. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org