michalursa opened a new pull request #11446: URL: https://github.com/apache/arrow/pull/11446
Supporting dictionary arrays and dictionary scalars as inputs to hash join on both its sides, in key columns and non-key columns. A key column from probe side of the join can be matched against a key column from build side of the join, as long as the underlying value types are equal, that means that: - dictionary column (on either side) can be matched against non-dictionary column (on the other side) if underlying value types are equal - dictionary column can be matched against dictionary column with a different index type, and potentially using a different dictionary, as long as the underlying value types are equal We keep the same limitation that is present in hash group by with respect to dictionaries, that is the same dictionary must be used for a given column in all input exec batches. The values in the dictionary do not have to be unique - it can contain duplicate entries and/or null entries. This change is build on top of https://github.com/apache/arrow/pull/11350 (fixing thread sanitizer problems in hash join node). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org