Re: [I] Proposal: Hook to better support `CollectLeft` joins in distributed execution [datafusion]

via GitHub Sun, 15 Sep 2024 04:36:47 -0700


alamb commented on issue #12454:
URL: https://github.com/apache/datafusion/issues/12454#issuecomment-2351550819


   > > Though @thinkharderdev maybe that is another idea: how about do the 
`OUTER JOIN` across all tables and then run the results through a second 
operator that removes any duplicate `NULL` padded rows 🤔
   > 
   > That would still require coalescing all the output partitions from the 
hash join into a single partition and processing that stream on a single node.
   
   That is right (or alternately repartitioning the `facts` table and the 
output of the join
   
   Depending on the join's output cardinality that might not be too bad (it is 
certainly better than repartitioning the base `data` table) but it could also 
be bad. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Proposal: Hook to better support `CollectLeft` joins in distributed execution [datafusion]

Reply via email to