alamb commented on code in PR #10306:
URL: https://github.com/apache/datafusion/pull/10306#discussion_r1585280851
##########
datafusion/core/src/datasource/physical_plan/parquet/mod.rs:
##########
@@ -243,14 +243,32 @@ impl ParquetExec {
}
/// If enabled, the reader will read by the bloom filter
- pub fn with_enable_bloom_filter(mut self, enable_bloom_filter: bool) ->
Self {
- self.table_parquet_options.global.bloom_filter_enabled =
enable_bloom_filter;
+ pub fn with_enable_bloom_filter_on_read(
Review Comment:
I think we can make this a bit shorter:
```suggestion
pub fn with_bloom_filter_on_read(
```
##########
datafusion/common/src/config.rs:
##########
@@ -395,8 +395,11 @@ config_namespace! {
/// default parquet writer setting
pub encoding: Option<String>, default = None
- /// Sets if bloom filter is enabled for any column
- pub bloom_filter_enabled: bool, default = false
+ /// Sets if bloom filter on read is enabled for any column
Review Comment:
```suggestion
/// Use any available bloom filters when reading parquet files
```
##########
datafusion/common/src/config.rs:
##########
@@ -395,8 +395,11 @@ config_namespace! {
/// default parquet writer setting
pub encoding: Option<String>, default = None
- /// Sets if bloom filter is enabled for any column
- pub bloom_filter_enabled: bool, default = false
+ /// Sets if bloom filter on read is enabled for any column
+ pub bloom_filter_on_read_enabled: bool, default = true
+
+ /// Sets if bloom filter on write is enabled for any column
Review Comment:
```suggestion
/// Write bloom filters for all columns when creating parquet files
```
##########
datafusion/sqllogictest/test_files/predicates.slt:
##########
@@ -515,7 +515,7 @@ statement ok
CREATE EXTERNAL TABLE data_index_bloom_encoding_stats STORED AS PARQUET
LOCATION '../../parquet-testing/data/data_index_bloom_encoding_stats.parquet';
statement ok
-set datafusion.execution.parquet.bloom_filter_enabled=true;
+set datafusion.execution.parquet.bloom_filter_on_read_enabled=true;
Review Comment:
It seems like this test was to ensure we had end to end coverage, so now
that we switched the default the meaning / coverage has changed
Perhaps we can change it so:
1. verify the setting with `show
datafusion.execution.parquet.bloom_filter_on_read_enabled`
2. Add a second set of queries with `set
datafusion.execution.parquet.bloom_filter_on_read_enabled=false`
##########
datafusion/common/src/config.rs:
##########
@@ -395,8 +395,11 @@ config_namespace! {
/// default parquet writer setting
pub encoding: Option<String>, default = None
- /// Sets if bloom filter is enabled for any column
- pub bloom_filter_enabled: bool, default = false
+ /// Sets if bloom filter on read is enabled for any column
+ pub bloom_filter_on_read_enabled: bool, default = true
Review Comment:
It seems like `enabled` is somewhat redundant in this option name. Since we
are renaming it anyways, what do you think about calling it just
`bloom_filter_on_read` and `bloom_filter_on_write`? Something like
```suggestion
pub bloom_filter_on_read: bool, default = true
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]