houqp commented on pull request #55: URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-864615921
@alamb, turns out the bug you demoed is not simple to fix and surfaced a problem in how join columns are handled in the current design. Based on what I have seen in MySQL and PostgreSQL, join columns deduplication should only be applied to join clause with `USING` constraints. We are currently applying the deduplication for all join types other than semi/anti joins. I have a fix in https://github.com/houqp/arrow-datafusion/commit/76367875b103c3a8274ca6e8f660aa1c6177b668, which touched a lot of files and is a 600+ LoC diff in itself. So considering this bug also exits in the current master, I think it would be easier to merge the current reviewed PR as is. Then we can focus on my fix to discuss whether the change in join column handling logic is the right move or not. I am also not very happen with how my fix is implemented, so would love to get some ideas on alternative implementations as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
