gabotechs commented on PR #19761: URL: https://github.com/apache/datafusion/pull/19761#issuecomment-3989100294
> The tradeoff here is reapplying filters to in-memory batches vs getting filtered I/O from the start, but having to wait for it. This should be fine in most cases: the buffer fill happens concurrently with the build so the unfiltered I/O costs no wall clock time, evaluating a filter on in-memory batches is cheap compared to storage I/O, and the buffer is bounded by hash_join_buffering_capacity so the unfiltered portion is small. This sounds complex to do, but I can imagine it yielding good results. I'd suggest exploring this in a follow up PR rather than in this one though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
