alamb commented on issue #12454: URL: https://github.com/apache/datafusion/issues/12454#issuecomment-2351550819
> > Though @thinkharderdev maybe that is another idea: how about do the `OUTER JOIN` across all tables and then run the results through a second operator that removes any duplicate `NULL` padded rows 🤔 > > That would still require coalescing all the output partitions from the hash join into a single partition and processing that stream on a single node. That is right (or alternately repartitioning the `facts` table and the output of the join Depending on the join's output cardinality that might not be too bad (it is certainly better than repartitioning the base `data` table) but it could also be bad. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
