alamb commented on issue #506: URL: https://github.com/apache/arrow-rs/issues/506#issuecomment-870031410
> A nice benefit of this is that a GROUP BY that dictionary column afterwards would be very cheap since it does not need another hashmap and instead could index directly into an array of accumulators with the keys. Not sure if that is the usecase you are after or if this is more of a nice side effect. I think "side effect" :) > Ensuring sorted dictionaries is something I'm definitely interested in, Field already has the dict_is_ordered flag based on which a much faster implementation of sort comparator or comparison kernel could be selected. Yes, we are *definitely* also interested in ensuring sorted dictionaries (we have an optimized physical representation that requires a sorted dictionary and today it simply resorts the incoming dictionary, unnecessarily) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
