[GitHub] [arrow-rs] alamb commented on issue #506: "Optimize" Dictionary contents in DictionaryArray / `concat_batches`

GitBox Mon, 28 Jun 2021 13:49:45 -0700


alamb commented on issue #506:
URL: https://github.com/apache/arrow-rs/issues/506#issuecomment-870031410



   > A nice benefit of this is that a GROUP BY that dictionary column 
afterwards would be very cheap since it does not need another hashmap and 
instead could index directly into an array of accumulators with the keys. Not 
sure if that is the usecase you are after or if this is more of a nice side 
effect.
   
   I think "side effect" :)
   
   > Ensuring sorted dictionaries is something I'm definitely interested in, 
Field already has the dict_is_ordered flag based on which a much faster 
implementation of sort comparator or comparison kernel could be selected.
   
   Yes, we are *definitely* also interested in ensuring sorted dictionaries (we 
have an optimized physical representation that requires a sorted dictionary and 
today it simply resorts the incoming dictionary, unnecessarily) 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] alamb commented on issue #506: "Optimize" Dictionary contents in DictionaryArray / `concat_batches`

Reply via email to