jshmchenxi commented on a change in pull request #2642:
URL: https://github.com/apache/iceberg/pull/2642#discussion_r746529384
##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -111,6 +111,15 @@ private TableProperties() {
public static final String DELETE_AVRO_COMPRESSION =
"write.delete.avro.compression-codec";
public static final String AVRO_COMPRESSION_DEFAULT = "gzip";
+ public static final String PARQUET_BLOOM_FILTER_ENABLED =
"write.parquet.bloom-filter-enabled";
+ public static final boolean PARQUET_BLOOM_FILTER_ENABLED_DEFAULT = false;
Review comment:
Hi, Yufei, thanks for the review. The performance impact of writing
bloom filter should be negligible, though we didn't do a performance benchmark.
The cost of bloom filter is space. The default size of bloom filter for one
column is 1 MB in each parquet file. If there are N columns in the table, then
the extra space cost is **N MB in each file** to enable bloom filter for all
columns. It is more reasonable to enable bloom filter only for columns that is
of high cardinality and often used in filter expressions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]