jshmchenxi opened a new pull request #2642:
URL: https://github.com/apache/iceberg/pull/2642


   Split #2582 into several PRs.
   This part adds support for writing parquet bloom filter.
   
   Add 3 new TableProperties. The definition is similar to 
[apache/parquet-mr](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop)
   
   Property | Default | Description
   -- | -- | --
   | write.parquet.bloom-filter-enabled      | false          | Whether to 
enable writing bloom filter; If it is true, the bloom filter will be enable for 
all columns; If it is false, it will be disabled for all columns; It is also 
possible to enable it for some columns by specifying the column name within the 
property followed by #; For example, setting both 
`write.parquet.bloom-filter-enabled=true` and 
`write.parquet.bloom-filter-enabled#some_column=false` will enable bloom filter 
for all columns except `some_column` |
   | write.parquet.bloom-filter-max-bytes    | 1048576 (1 MB) | The maximum 
number of bytes for a bloom filter bitset        |
   | write.parquet.bloom-filter-expected-ndv | (not set)      | The expected 
number of distinct values in a column, it is used to compute the optimal size 
of the bloom filter; Note that if this property is not set, the bloom filter 
will use the maximum size; If this property is set for a column, then no need 
to enable the bloom filter with `write.parquet.bloom-filter-enabled` property; 
For example, setting `write.parquet.bloom-filter-expected-ndv#some_column=200` 
will enable bloom filter for `some_column` with expected number of distinct 
values equals to 200 |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to