kbendick commented on code in PR #5313: URL: https://github.com/apache/iceberg/pull/5313#discussion_r947158423
########## docs/configuration.md: ########## @@ -64,6 +64,8 @@ Iceberg tables support table properties to configure table behavior, like the de | write.orc.block-size-bytes | 268435456 (256 MB) | Define the default file system block size for ORC files | | write.orc.compression-codec | zlib | ORC compression codec: zstd, lz4, lzo, zlib, snappy, none | | write.orc.compression-strategy | speed | ORC compression strategy: speed, compression | +| write.orc.bloom.filter.columns | (not set) | Comma separated list of column names for which a Bloom filter must be created | +| write.orc.bloom.filter.fpp | 0.05 | False positive probability for Bloom filter (must > 0.0 and < 1.0) | Review Comment: You might want to match the parquet configurations a bit more closely. They are ``` | write.parquet.bloom-filter-enabled.column.col1 | (not set) | Enables writing a bloom filter for the column: col1| | write.parquet.bloom-filter-max-bytes | 1048576 (1 MB) | The maximum number of bytes for a bloom filter bitset | ``` So you could do `write.orc.bloom-filter-enabled.column.col1`. This also matches other config value formatting that's per column, such as `write.metadata.metrics.column.col1`. For the `fpp`, as that seems to be how the value is set on the ORC bloom filter, I would suggest keeping it that way. But if the parquet implementation is translating from the max-bytes to fpp, then possibly setting the config that way for consistency (but I doubt that it is). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
