alamb opened a new pull request, #3822: URL: https://github.com/apache/arrow-datafusion/pull/3822
# Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/3821 # Rationale for this change I want to test out the parquet filter pushdown on real datasets using datafusion-cli so we can enable it by default -- https://github.com/apache/arrow-datafusion/issues/3463 I want to be able to do so via `datafusion-cli` like: ```shell $ target/debug/datafusion-cli DataFusion CLI v13.0.0 ❯ show all; +-------------------------------------------------+---------+ | name | setting | +-------------------------------------------------+---------+ | datafusion.execution.time_zone | UTC | | datafusion.execution.parquet.pushdown_filters | false | <---- Note the option is now visible here | datafusion.explain.physical_plan_only | false | | datafusion.execution.coalesce_target_batch_size | 4096 | | datafusion.execution.batch_size | 8192 | | datafusion.execution.coalesce_batches | true | | datafusion.explain.logical_plan_only | false | | datafusion.optimizer.skip_failed_rules | true | | datafusion.optimizer.filter_null_join_keys | false | +-------------------------------------------------+---------+ ``` And then set them like: ```shell $ DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true target/debug/datafusion-cli DataFusion CLI v13.0.0 ❯ show all; +-------------------------------------------------+---------+ | name | setting | +-------------------------------------------------+---------+ | datafusion.execution.batch_size | 8192 | | datafusion.execution.coalesce_batches | true | | datafusion.explain.logical_plan_only | false | | datafusion.optimizer.filter_null_join_keys | false | | datafusion.execution.parquet.enable_page_index | false | | datafusion.optimizer.skip_failed_rules | true | | datafusion.explain.physical_plan_only | false | | datafusion.execution.time_zone | UTC | | datafusion.execution.coalesce_target_batch_size | 4096 | | datafusion.execution.parquet.pushdown_filters | true | <---- Note the option is set to true here!!!! | datafusion.execution.parquet.reorder_filters | false | +-------------------------------------------------+---------+ ``` # What changes are included in this PR? 1. Add three new config settings to `ConfigOptions` 3. Thread `ConfigOptions` down into the FileScanConfig 2. Remove `ParquetScanOptions` in favor of these new configs (will comment on the rationale here) # Are there any user-facing changes? YES: If you used `ParquetScanOptions` (which I know @thinkharderdev does) the API has changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
