gene-bordegaray commented on PR #21931:
URL: https://github.com/apache/datafusion/pull/21931#issuecomment-4477213114

   The CPU utilization went up a good amount.
   
   Could it be better to have some threshold to only use the `global_minmax` 
and drop the `multi_hash_lookup` so we aren't probing every partition map when 
expensive?
   
   Or we could split and tune 
`datafusion.optimizer.hash_join_inlist_pushdown_max_distinct_values`. It is set 
to 20 right now which caps how large the arrays can be per partition and the 
union arrays across partitions. This seems pretty low especially for the 
global. Maybe split this into 2 configs and we can tune them a bit more?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to