alamb commented on pull request #9233:
URL: https://github.com/apache/arrow/pull/9233#issuecomment-762174671


   > If we are able to describe in the partitioning information that the 
partition is hashed by some column that is a dictionary, doesn't that allow us 
to perform very fast hashing (based on the dictionary indexes)?
   
   @jorgecarleitao  yes I think that would be a great optimization, or possibly 
skipping hashing entirely and build the aggregate table entirely on the 
dictionary indexes -- I suspect this would work well in the common case, but we 
would have to handle the case where the dictionary itself is not the same 
across all record batches (and thus indexes in one record batch may not 
correspond to the same value in another)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to