adriangb commented on code in PR #19928:
URL: https://github.com/apache/datafusion/pull/19928#discussion_r2716954128
##########
datafusion/proto-common/src/generated/prost.rs:
##########
@@ -815,6 +815,10 @@ pub struct ParquetOptions {
pub max_row_group_size: u64,
#[prost(string, tag = "16")]
pub created_by: ::prost::alloc::string::String,
+ #[prost(oneof = "parquet_options::PruningMaxInlistLimitOpt", tags = "35")]
+ pub pruning_max_inlist_limit_opt: ::core::option::Option<
+ parquet_options::PruningMaxInlistLimitOpt,
+ >,
Review Comment:
Also here could we make `PruningPredicateConfig` serializable and send that
across the wire?
##########
datafusion/common/src/config.rs:
##########
@@ -691,6 +691,11 @@ config_namespace! {
/// the parquet file
pub pruning: bool, default = true
+ /// (reading) Maximum number of elements (inclusive) in InList exprs
to be eligible for pruning.
+ /// When some InList exprs contain more than this threshold, these
expressions are ignored during pruning,
+ /// but other expressions may still be used for pruning.
Review Comment:
```suggestion
/// (reading) Maximum number of elements (inclusive) in InList exprs
to be eligible for statistics pruning.
/// When some InList exprs contain more than this threshold, these
expressions are ignored during statistics pruning,
/// but other expressions may still be used for pruning.
/// If an `InList` expression is not used for statistics pruning
that does not mean it is ignored
/// altogether, it is still used as a filter at the data / per row
level.
/// This does not impact [`ParquetOptions::push_down_filters`],
large `InList` expressions
/// are always evaluated against each row when this option is
enabled.
```
##########
datafusion/datasource-parquet/src/opener.rs:
##########
@@ -104,6 +106,8 @@ pub(super) struct ParquetOpener {
pub enable_bloom_filter: bool,
/// Should row group pruning be applied
pub enable_row_group_stats_pruning: bool,
+ /// Maximum number of elements (inclusive) in InList exprs to be eligible
for pruning
+ pub pruning_max_inlist_limit: usize,
Review Comment:
I wonder if we could just make `PruningPredicateConfig` a field here instead
of polluting with more fields. Also can this be `pub(crate)` instead of `pub`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]