Dandandan commented on pull request #9937:
URL: https://github.com/apache/arrow/pull/9937#issuecomment-815445960


   > > We should filter on nulls beforehand to make this result correct. 
Probably the best way to go here I think is to add a filter in the logical plan 
on non-null for inner / left and right joins.
   > 
   > I am not sure this works for all join types (OUTER JOIN as well as , 
ANTI-JOIN and SEMI-JOIN which are optimizations for subqueries)
   > 
   > It might make sense to check for null when building the hash table for 
inner join keys (as NULL will never equal NULL)
   
   You are right, not for outer join or other joins (but we don't have them 
yet). For those, I think the rows have to be included, but might need some 
changes too wrt equality and building the hashmap. The filter approach is what 
Spark does fwiw. I think that makes also sense in the conceptually as joins 
should also support other conditions and allows for greater efficiency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to