houqp commented on pull request #55:
URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-864615921


   @alamb, turns out the bug you demoed is not simple to fix and surfaced a 
problem in how join columns are handled in the current design. Based on what I 
have seen in MySQL and PostgreSQL, join columns deduplication should only be 
applied to join clause with `USING` constraints. We are currently applying the 
deduplication for all join types other than semi/anti joins.
   
   I have a fix in 
https://github.com/houqp/arrow-datafusion/commit/76367875b103c3a8274ca6e8f660aa1c6177b668,
 which touched a lot of files and is a 600+ LoC diff in itself.
   
   So considering this bug also exits in the current master, I think it would 
be easier to merge the current reviewed PR as is. Then we can focus on my fix 
to discuss whether the change in join column handling logic is the right move 
or not. I am also not very happen with how my fix is implemented, so would love 
to get some ideas on alternative implementations as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to