This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new 222205df2 datafusion.optimizer.repartition_file_scans enabled by
default (#5295)
222205df2 is described below
commit 222205df2e26befa3f7f8d862fefea413fbafc4d
Author: Eduard Karacharov <[email protected]>
AuthorDate: Mon Feb 20 15:59:45 2023 +0300
datafusion.optimizer.repartition_file_scans enabled by default (#5295)
---
datafusion/common/src/config.rs | 2 +-
datafusion/core/tests/sqllogictests/test_files/information_schema.slt | 2 +-
docs/source/user-guide/configs.md | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/datafusion/common/src/config.rs b/datafusion/common/src/config.rs
index 6033d4a0c..03ec74541 100644
--- a/datafusion/common/src/config.rs
+++ b/datafusion/common/src/config.rs
@@ -284,7 +284,7 @@ config_namespace! {
/// Currently supported only for Parquet format in which case
/// multiple row groups from the same file may be read concurrently.
If false then each
/// row group is read serially, though different files may be read in
parallel.
- pub repartition_file_scans: bool, default = false
+ pub repartition_file_scans: bool, default = true
/// Should DataFusion repartition data using the partitions keys to
execute window
/// functions in parallel using the provided `target_partitions` level
diff --git
a/datafusion/core/tests/sqllogictests/test_files/information_schema.slt
b/datafusion/core/tests/sqllogictests/test_files/information_schema.slt
index 99fdfe30a..81ce141a8 100644
--- a/datafusion/core/tests/sqllogictests/test_files/information_schema.slt
+++ b/datafusion/core/tests/sqllogictests/test_files/information_schema.slt
@@ -132,7 +132,7 @@ datafusion.optimizer.max_passes 3
datafusion.optimizer.prefer_hash_join true
datafusion.optimizer.repartition_aggregations true
datafusion.optimizer.repartition_file_min_size 10485760
-datafusion.optimizer.repartition_file_scans false
+datafusion.optimizer.repartition_file_scans true
datafusion.optimizer.repartition_joins true
datafusion.optimizer.repartition_sorts true
datafusion.optimizer.repartition_windows true
diff --git a/docs/source/user-guide/configs.md
b/docs/source/user-guide/configs.md
index 37b750c59..1e457ee9b 100644
--- a/docs/source/user-guide/configs.md
+++ b/docs/source/user-guide/configs.md
@@ -60,7 +60,7 @@ Environment variables are read during `SessionConfig`
initialisation so they mus
| datafusion.optimizer.repartition_aggregations | true |
Should DataFusion repartition data using the aggregate keys to execute
aggregates in parallel using the provided `target_partitions` level
[...]
| datafusion.optimizer.repartition_file_min_size | 10485760 |
Minimum total files size in bytes to perform file scan repartitioning.
[...]
| datafusion.optimizer.repartition_joins | true |
Should DataFusion repartition data using the join keys to execute joins in
parallel using the provided `target_partitions` level
[...]
-| datafusion.optimizer.repartition_file_scans | false |
When set to true, file groups will be repartitioned to achieve maximum
parallelism. Currently supported only for Parquet format in which case multiple
row groups from the same file may be read concurrently. If false then each row
group is read serially, though different files may be read in parallel.
[...]
+| datafusion.optimizer.repartition_file_scans | true |
When set to true, file groups will be repartitioned to achieve maximum
parallelism. Currently supported only for Parquet format in which case multiple
row groups from the same file may be read concurrently. If false then each row
group is read serially, though different files may be read in parallel.
[...]
| datafusion.optimizer.repartition_windows | true |
Should DataFusion repartition data using the partitions keys to execute window
functions in parallel using the provided `target_partitions` level
[...]
| datafusion.optimizer.repartition_sorts | true |
Should DataFusion execute sorts in a per-partition fashion and merge afterwards
instead of coalescing first and sorting globally. With this flag is enabled,
plans in the form below "SortExec: [a@0 ASC]", " CoalescePartitionsExec", "
RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1", would
turn into the plan below which performs better in multithreaded environments
"SortPreservingMergeExec: [a@0 [...]
| datafusion.optimizer.skip_failed_rules | true |
When set to true, the logical plan optimizer will produce warning messages if
any optimization rules produce errors and then proceed to the next rule. When
set to false, any rules that produce errors will cause the query to fail
[...]