rok commented on pull request #9683: URL: https://github.com/apache/arrow/pull/9683#issuecomment-797635910
> My instinct is that, rather than unifying first and then determining unique values/counting/hashing, what if we could do the aggregation on each chunk first and then unify the results? That would be a smaller amount of data to manipulate. Indeed unifying over all chunks first and then transposing individual chunk indices would be a better idea! I'm still a bit unfamiliar with kernel mechanics but I'm thinking implementing a new kernel for chunked DictionaryArrays with different dictionaries will be the best way to go for this. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
