alamb commented on issue #12454: URL: https://github.com/apache/datafusion/issues/12454#issuecomment-2350940281
You are right about broadcast join, but I think for `OUTER JOIN` cases the relation that is not preserved (aka the one that is not being padded with nulls) is what is broadcast and the other needs to be partitioned on the join key (to ensure all possible non-matching rows occur on only one node). In this case I think the distribution isn't quite right Though @thinkharderdev maybe that is another idea: how about do the `OUTER JOIN` across all tables and then run the results through a second operator that removes any duplicate `NULL` padded rows 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
