jshmchenxi opened a new pull request #2642: URL: https://github.com/apache/iceberg/pull/2642
Split #2582 into several PRs. This part adds support for writing parquet bloom filter. Add 3 new TableProperties. The definition is similar to [apache/parquet-mr](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop) Property | Default | Description -- | -- | -- | write.parquet.bloom-filter-enabled | false | Whether to enable writing bloom filter; If it is true, the bloom filter will be enable for all columns; If it is false, it will be disabled for all columns; It is also possible to enable it for some columns by specifying the column name within the property followed by #; For example, setting both `write.parquet.bloom-filter-enabled=true` and `write.parquet.bloom-filter-enabled#some_column=false` will enable bloom filter for all columns except `some_column` | | write.parquet.bloom-filter-max-bytes | 1048576 (1 MB) | The maximum number of bytes for a bloom filter bitset | | write.parquet.bloom-filter-expected-ndv | (not set) | The expected number of distinct values in a column, it is used to compute the optimal size of the bloom filter; Note that if this property is not set, the bloom filter will use the maximum size; If this property is set for a column, then no need to enable the bloom filter with `write.parquet.bloom-filter-enabled` property; For example, setting `write.parquet.bloom-filter-expected-ndv#some_column=200` will enable bloom filter for `some_column` with expected number of distinct values equals to 200 | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
