rluvaton commented on PR #9700:
URL: https://github.com/apache/arrow-rs/pull/9700#issuecomment-4334525723

   > I sympathize with wanting to be a bit smarter about when to give up on 
dictionary encoding. I would, however, like to see something a bit more 
defensible before proceeding with this change. For example, I'd like to see 
examples where this heuristic outperforms the current defaults by more than 
10%, say, and also outperforms disabling dictionary encoding altogether 
(something which is already an opt-in option, as this new heuristic would be).
   
   @etseidl so it is possible to have both outperform by more than 10% while 
also outperform disabling dictionary **for the same data** if some row groups 
will benefit from dictionary while others don't and disabling dictionary will 
disable for all the row groups
   
   and I see @mzabaluev added a test that verify that comparing with plain 
result in smaller size


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to