Fokko commented on issue #7663:
URL: https://github.com/apache/iceberg/issues/7663#issuecomment-1555872635

   > Dictionary encoding is useful for low cardinality columns, so the space 
difference between the two is negligible, with the tradeoff being deterministic 
lookups vs False positives from the bloom filter.
   
   If it is low cardinality, the likelihood of having false positives is also 
low (assuming a fixed size bit for the bloom filter). I'm not sure if the 
dictionary is used for skipping data for example, but I don't think that should 
influence or impact this decision. Because if that's not the case, then we 
should fix that :)
   
   I'm not super strong on this, but in the end, I think less configuration is 
better. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to