rluvaton commented on PR #9700: URL: https://github.com/apache/arrow-rs/pull/9700#issuecomment-4334525723
> I sympathize with wanting to be a bit smarter about when to give up on dictionary encoding. I would, however, like to see something a bit more defensible before proceeding with this change. For example, I'd like to see examples where this heuristic outperforms the current defaults by more than 10%, say, and also outperforms disabling dictionary encoding altogether (something which is already an opt-in option, as this new heuristic would be). @etseidl so it is possible to have both outperform by more than 10% while also outperform disabling dictionary **for the same data** if some row groups will benefit from dictionary while others don't and disabling dictionary will disable for all the row groups and I see @mzabaluev added a test that verify that comparing with plain result in smaller size -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
