michalursa opened a new pull request #9768:
URL: https://github.com/apache/arrow/pull/9768


   This is the draft version of the code implementing functionality for mapping 
arbitrary set of input columns considered a key in grouping operation into a 
vector containing integer group identifiers (same combinations of input key 
columns get same ids). 
   
   I will continue working on it and updating it with:
   - integration with initial hash group by implementation in Arrow project, 
once it is finished and merged into master
   - unit tests
   - documentation
   
   At this point group ids, row ids, offsets, hash values are 32-bit. The 
overflow checks are missing in current version and still need to be fixed. 
   
   The entry point for id mapping is GroupBy class. It uses three main modules: 
storage defined in groupby_storage* files, hash defined in groupby_hash* files 
and hash table defined in groupby_map* files. Key values stored with the hash 
table are row oriented. Storage part of the code defines functions converting 
from column oriented storage to row oriented storage and back. It also 
implements comparison and appending keys to the incremental store.
   
   I plan to add design doc in a form of a readme file later on.
   
   The individual modules and functions present here have been tested with unit 
tests and are passing them but unit tests are not included in this change yet. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to