gabotechs commented on PR #19761: URL: https://github.com/apache/datafusion/pull/19761#issuecomment-3742337913
> Interesting idea, do you have some insights on the memory usage vs not doing this "eager execution"? This definitely has an impact to memory consumption, as it holds record batches in-memory until the hash join decides to start consuming them. This is the reason why it's important to put a limit to how much memory is buffered (currently configurable). With the current setup reported in the benchmarks, it will buffer at most 1Mb per partition (can be configured with `execution.hash_join_buffering_capacity`), so the memory footprint is at most ~1Mb * `execution.target_partitions` per hash join present in the query. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
