mapleFU commented on code in PR #37400:
URL: https://github.com/apache/arrow/pull/37400#discussion_r1312485799
##########
cpp/src/parquet/properties.h:
##########
@@ -532,6 +571,50 @@ class PARQUET_EXPORT WriterProperties {
return this->disable_statistics(path->ToDotString());
}
+ /// Enable writing bloom filter in general for all columns. Default
disabled.
+ ///
+ /// Please check the link below for more details:
+ /// https://github.com/apache/parquet-format/blob/master/BloomFilter.md
+ Builder* enable_bloom_filter() {
+ default_column_properties_.set_bloom_filter_enabled(true);
+ return this;
+ }
+
+ /// Enable bloom filter for the column specified by `path`.
+ /// Default disabled.
+ Builder* enable_bloom_filter(const std::string& path) {
+ auto iter = bloom_filter_options_.find(path);
+ if (iter == bloom_filter_options_.end() || iter->second == std::nullopt)
{
+ bloom_filter_options_[path] = BloomFilterOptions();
Review Comment:
Yeah, I'll remove this. I think user should **explicit** set the BloomFilter
arguments ( like ndv), otherwise he/she will get unexpected result( like too
large BloomFilter in small file)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]