rok commented on pull request #9683:
URL: https://github.com/apache/arrow/pull/9683#issuecomment-797635910


   > My instinct is that, rather than unifying first and then determining 
unique values/counting/hashing, what if we could do the aggregation on each 
chunk first and then unify the results? That would be a smaller amount of data 
to manipulate.
   
   Indeed unifying over all chunks first and then transposing individual chunk 
indices would be a better idea!
   
   I'm still a bit unfamiliar with kernel mechanics but I'm thinking 
implementing a new kernel for chunked DictionaryArrays with different 
dictionaries will be the best way to go for this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to