Fokko commented on issue #7663: URL: https://github.com/apache/iceberg/issues/7663#issuecomment-1555872635
> Dictionary encoding is useful for low cardinality columns, so the space difference between the two is negligible, with the tradeoff being deterministic lookups vs False positives from the bloom filter. If it is low cardinality, the likelihood of having false positives is also low (assuming a fixed size bit for the bloom filter). I'm not sure if the dictionary is used for skipping data for example, but I don't think that should influence or impact this decision. Because if that's not the case, then we should fix that :) I'm not super strong on this, but in the end, I think less configuration is better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
