kbendick commented on code in PR #4831:
URL: https://github.com/apache/iceberg/pull/4831#discussion_r883140834
##########
core/src/main/java/org/apache/iceberg/TableProperties.java:
##########
@@ -167,6 +167,16 @@ private TableProperties() {
"write.delete.parquet.row-group-check-max-record-count";
public static final int PARQUET_ROW_GROUP_CHECK_MAX_RECORD_COUNT_DEFAULT =
10000;
+ public static final String DEFAULT_PARQUET_BLOOM_FILTER_ENABLED =
"write.parquet.bloom-filter-enabled.default";
Review Comment:
+1. While it's consistent with the parquet-mr bloom filter implementaiton,
we need to think of user experience first and foremost.
It doesn't make sense to enable bloom filters for _a lot_ of columns. And
many users don't do any tuning of their metadata / statistics.
I think it's in-line with other things we do to make the users experience
better, like turning off column level statistics after a certain number of
columns. We can point it out in the docs under a big `!!!NOTE` (that's
highlighted) that bloom filter is only used when turned on.
It's really an advanced thing to use at all imo.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]