Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

via GitHub Wed, 09 Oct 2024 13:05:50 -0700


parthchandra commented on PR #1007:
URL: 
https://github.com/apache/datafusion-comet/pull/1007#issuecomment-2403334431


   There is a small danger in enabling this without having a good estimate of 
the size of the build side. ShuffleHashJoin has limits on how much data it can 
process efficiently. If the build side hash table has no spilling then a large 
enough build side will cause OOMs and if there is spilling, then SMJ can 
frequently lead to better performance. We might even see this when we scale the 
benchmark from SF1 to say SF10.
   Is there a way for us to get cardinality and row size for the build side 
somehow? 
   Still worth adding this option though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

Reply via email to