michalursa opened a new pull request #11446:
URL: https://github.com/apache/arrow/pull/11446


   Supporting dictionary arrays and dictionary scalars as inputs to hash join 
on both its sides, in key columns and non-key columns. 
   
   A key column from probe side of the join can be matched against a key column 
from build side of the join, as long as the underlying value types are equal, 
that means that: 
   - dictionary column (on either side) can be matched against non-dictionary 
column (on the other side) if underlying value
   types are equal
   - dictionary column can be matched against dictionary column with a 
different index type, and potentially using a different dictionary, as long as 
the underlying value types are equal
   
   We keep the same limitation that is present in hash group by with respect to 
dictionaries, that is the same dictionary must be used for a given column in 
all input exec batches. The values in the dictionary do not have to be unique - 
it can contain duplicate entries and/or null entries.
   
   This change is build on top of https://github.com/apache/arrow/pull/11350 
(fixing thread sanitizer problems in hash join node).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to