michalursa opened a new pull request #11047:
URL: https://github.com/apache/arrow/pull/11047


   This is a work in progress provided for visibility, not a working code yet. 
   The code is based on the branch 
michalursa:ARROW-13532-filter-interface-for-grouper
   
   Represents a collection of building blocks for implementing all flavors of 
hash join (semi, anti-semi, inner, outer). 
   For simpler navigation the code is broken into multiple files:
   - join_schema - helper classes for finding corresponding pairs of columns in 
two different sources (batch, hash table)
   - join_batch - helper classes for assembling and accumulating output rows in 
a batch taking input from both batch and hash table; source pairs of row ids 
are a result of hash table lookup
   - join_hashtable - building and querying hash table and related structures
   - join_filter - Bloom-like filter implementation
   - join_probe - (not implemented yet) join probe side processing logic 
related to implementing all 8 flavors of join
   - join_side - state of processing for each of two sides of a join, storage 
of accumulated rows, hash table, Bloom-like filter (called early filter or 
approximate membership test in the code)
   - join_type - constants and their manipulation for 8 flavors of join
   - join - (not implemented yet) glue code for all of the above and 
implementation of ExecNode interface
   
   The main features that will be missing when this code is ready for review 
are:
   - parallel hash table and Bloom-like filter build
   - handling of dictionaries
   - support of residual predicates with outer joins (non-equality filters that 
are a part of join match condition)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to