gene-bordegaray commented on PR #21931: URL: https://github.com/apache/datafusion/pull/21931#issuecomment-4477213114
The CPU utilization went up a good amount. Could it be better to have some threshold to only use the `global_minmax` and drop the `multi_hash_lookup` so we aren't probing every partition map when expensive? Or we could split and tune `datafusion.optimizer.hash_join_inlist_pushdown_max_distinct_values`. It is set to 20 right now which caps how large the arrays can be per partition and the union arrays across partitions. This seems pretty low especially for the global. Maybe split this into 2 configs and we can tune them a bit more? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
