michalursa opened a new pull request #11047: URL: https://github.com/apache/arrow/pull/11047
This is a work in progress provided for visibility, not a working code yet. The code is based on the branch michalursa:ARROW-13532-filter-interface-for-grouper Represents a collection of building blocks for implementing all flavors of hash join (semi, anti-semi, inner, outer). For simpler navigation the code is broken into multiple files: - join_schema - helper classes for finding corresponding pairs of columns in two different sources (batch, hash table) - join_batch - helper classes for assembling and accumulating output rows in a batch taking input from both batch and hash table; source pairs of row ids are a result of hash table lookup - join_hashtable - building and querying hash table and related structures - join_filter - Bloom-like filter implementation - join_probe - (not implemented yet) join probe side processing logic related to implementing all 8 flavors of join - join_side - state of processing for each of two sides of a join, storage of accumulated rows, hash table, Bloom-like filter (called early filter or approximate membership test in the code) - join_type - constants and their manipulation for 8 flavors of join - join - (not implemented yet) glue code for all of the above and implementation of ExecNode interface The main features that will be missing when this code is ready for review are: - parallel hash table and Bloom-like filter build - handling of dictionaries - support of residual predicates with outer joins (non-equality filters that are a part of join match condition) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
