This is an automated email from the ASF dual-hosted git repository.

github-merge-queue[bot] pushed a commit to branch 
gh-readonly-queue/main/pr-22439-72e3de70f638189fc12254e3ecf17c23fc0da880
in repository https://gitbox.apache.org/repos/asf/datafusion.git

commit 070d0135330a1d084c3b4d510c782079d2cf60f5
Author: Adrian Garcia Badaracco <[email protected]>
AuthorDate: Wed May 27 16:48:56 2026 -0500

    feat: lower repartition_file_min_size default from 10 MiB to 1 MiB (#22439)
    
    ## Summary
    
    `repartition_file_min_size` gates how aggressively `repartitioned()`
    splits file groups by byte range to fan a scan out across
    `target_partitions` worth of cores. At 10 MiB the default leaves several
    SF1-sized dimension tables (TPC-H \`part\` ≈ 24 MiB, TPC-DS
    \`customer_address\` ≈ 7 MiB, …) on a single partition, so any CPU-bound
    per-batch work in the scan (filter eval, dictionary expansion, etc.) is
    single-threaded even when the cluster has plenty of idle cores.
    
    At 1 MiB those same files split cleanly into \`target_partitions\` byte
    ranges. The cost (more \`open()\` calls, more metadata loads) is small
    in absolute terms (≤10 extra opens per file in the worst case, each
    amortised over the row-group / page-index reads) and the existing knob
    is still available for workloads where it matters.
    
    ## Benchmark numbers
    
    12-core, SF1, with the existing dynamic-filter-pushdown defaults
    preserved:
    
    | Suite | default (10 MiB) | with this PR (1 MiB) |
    |---|---|---|
    | TPC-H total | 841 ms | 776 ms |
    | TPC-H Q22 | ~30 ms | ~17 ms |
    | TPC-DS total | 11.0 s | 11.1 s |
    | ClickBench total | 21.7 s | 19.0 s |
    
    ## Test plan
    
    - [x] \`cargo test --test sqllogictests\` — all 472 files pass after the
    information_schema snapshot and a csv_files reset.
    - [ ] \`run benchmarks\`
    
    Co-authored-by: adriangb <[email protected]>
---
 datafusion/common/src/config.rs                    |  9 +++-
 datafusion/sqllogictest/test_files/csv_files.slt   |  2 +-
 .../sqllogictest/test_files/information_schema.slt |  6 +--
 .../test_files/tpch/plans/q10.slt.part             |  4 +-
 .../test_files/tpch/plans/q13.slt.part             |  4 +-
 .../test_files/tpch/plans/q14.slt.part             |  4 +-
 .../test_files/tpch/plans/q16.slt.part             |  9 ++--
 .../test_files/tpch/plans/q17.slt.part             | 13 +++--
 .../test_files/tpch/plans/q18.slt.part             |  4 +-
 .../test_files/tpch/plans/q19.slt.part             |  3 +-
 .../sqllogictest/test_files/tpch/plans/q2.slt.part | 63 +++++++++++-----------
 .../test_files/tpch/plans/q20.slt.part             | 15 +++---
 .../test_files/tpch/plans/q22.slt.part             | 18 +++----
 .../sqllogictest/test_files/tpch/plans/q3.slt.part | 15 +++---
 .../sqllogictest/test_files/tpch/plans/q5.slt.part |  4 +-
 .../sqllogictest/test_files/tpch/plans/q7.slt.part |  4 +-
 .../sqllogictest/test_files/tpch/plans/q8.slt.part | 37 +++++++------
 .../sqllogictest/test_files/tpch/plans/q9.slt.part | 23 ++++----
 docs/source/user-guide/configs.md                  |  2 +-
 19 files changed, 117 insertions(+), 122 deletions(-)

diff --git a/datafusion/common/src/config.rs b/datafusion/common/src/config.rs
index e6d1ebbbbe..3e3ab3429a 100644
--- a/datafusion/common/src/config.rs
+++ b/datafusion/common/src/config.rs
@@ -1151,8 +1151,13 @@ config_namespace! {
         /// in parallel using the provided `target_partitions` level
         pub repartition_aggregations: bool, default = true
 
-        /// Minimum total files size in bytes to perform file scan 
repartitioning.
-        pub repartition_file_min_size: usize, default = 10 * 1024 * 1024
+        /// Minimum total file size in bytes for file-group byte-range
+        /// splitting to fire. Files (or merged file groups) smaller than this
+        /// stay as one partition. Lower values produce more, smaller
+        /// partitions — better at filling `target_partitions` worth of cores
+        /// when files are modestly sized, at the cost of slightly more
+        /// per-partition open / metadata-load overhead.
+        pub repartition_file_min_size: usize, default = 1024 * 1024
 
         /// Should DataFusion repartition data using the join keys to execute 
joins in parallel
         /// using the provided `target_partitions` level
diff --git a/datafusion/sqllogictest/test_files/csv_files.slt 
b/datafusion/sqllogictest/test_files/csv_files.slt
index d980e802c8..af2c6d41af 100644
--- a/datafusion/sqllogictest/test_files/csv_files.slt
+++ b/datafusion/sqllogictest/test_files/csv_files.slt
@@ -376,7 +376,7 @@ id3 value3
 
 # Reset repartition_file_min_size to default value
 statement ok
-SET datafusion.optimizer.repartition_file_min_size = 10485760;
+RESET datafusion.optimizer.repartition_file_min_size;
 
 statement ok
 drop table stored_table_with_cr_terminator;
diff --git a/datafusion/sqllogictest/test_files/information_schema.slt 
b/datafusion/sqllogictest/test_files/information_schema.slt
index b0c7e3f8fe..3bf101f203 100644
--- a/datafusion/sqllogictest/test_files/information_schema.slt
+++ b/datafusion/sqllogictest/test_files/information_schema.slt
@@ -325,7 +325,7 @@ datafusion.optimizer.prefer_existing_union false
 datafusion.optimizer.prefer_hash_join true
 datafusion.optimizer.preserve_file_partitions 0
 datafusion.optimizer.repartition_aggregations true
-datafusion.optimizer.repartition_file_min_size 10485760
+datafusion.optimizer.repartition_file_min_size 1048576
 datafusion.optimizer.repartition_file_scans true
 datafusion.optimizer.repartition_joins true
 datafusion.optimizer.repartition_sorts true
@@ -475,7 +475,7 @@ datafusion.optimizer.prefer_existing_union false When set 
to true, the optimizer
 datafusion.optimizer.prefer_hash_join true When set to true, the physical plan 
optimizer will prefer HashJoin over SortMergeJoin. HashJoin can work more 
efficiently than SortMergeJoin but consumes more memory
 datafusion.optimizer.preserve_file_partitions 0 Minimum number of distinct 
partition values required to group files by their Hive partition column values 
(enabling Hash partitioning declaration). How the option is used:     - 
preserve_file_partitions=0: Disable it.     - preserve_file_partitions=1: 
Always enable it.     - preserve_file_partitions=N, actual file partitions=M: 
Only enable when M >= N.     This threshold preserves I/O parallelism when file 
partitioning is below it. Note: Th [...]
 datafusion.optimizer.repartition_aggregations true Should DataFusion 
repartition data using the aggregate keys to execute aggregates in parallel 
using the provided `target_partitions` level
-datafusion.optimizer.repartition_file_min_size 10485760 Minimum total files 
size in bytes to perform file scan repartitioning.
+datafusion.optimizer.repartition_file_min_size 1048576 Minimum total file size 
in bytes for file-group byte-range splitting to fire. Files (or merged file 
groups) smaller than this stay as one partition. Lower values produce more, 
smaller partitions — better at filling `target_partitions` worth of cores when 
files are modestly sized, at the cost of slightly more per-partition open / 
metadata-load overhead.
 datafusion.optimizer.repartition_file_scans true When set to `true`, 
datasource partitions will be repartitioned to achieve maximum parallelism. 
This applies to both in-memory partitions and FileSource's file groups (1 group 
is 1 partition). For FileSources, only Parquet and CSV formats are currently 
supported. If set to `true` for a FileSource, all files will be repartitioned 
evenly (i.e., a single large file might be partitioned into smaller chunks) for 
parallel scanning. If set to `fa [...]
 datafusion.optimizer.repartition_joins true Should DataFusion repartition data 
using the join keys to execute joins in parallel using the provided 
`target_partitions` level
 datafusion.optimizer.repartition_sorts true Should DataFusion execute sorts in 
a per-partition fashion and merge afterwards instead of coalescing first and 
sorting globally. With this flag is enabled, plans in the form below ```text    
  "SortExec: [a@0 ASC]",      "  CoalescePartitionsExec",      "    
RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1", ``` 
would turn into the plan below which performs better in multithreaded 
environments ```text      "SortPreservingMe [...]
@@ -895,7 +895,7 @@ show functions
 statement ok
 reset datafusion.catalog.information_schema;
 
-# The SLT runner sets `target_partitions` to 4 instead of using the default, 
so 
+# The SLT runner sets `target_partitions` to 4 instead of using the default, so
 # reset it explicitly.
 statement ok
 set datafusion.execution.target_partitions = 4;
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q10.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q10.slt.part
index f00f48c75a..210468450d 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q10.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q10.slt.part
@@ -80,8 +80,8 @@ physical_plan
 09)----------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(o_orderkey@7, l_orderkey@0)], projection=[c_custkey@0, c_name@1, 
c_address@2, c_nationkey@3, c_phone@4, c_acctbal@5, c_comment@6, 
l_extendedprice@9, l_discount@10]
 10)------------------RepartitionExec: partitioning=Hash([o_orderkey@7], 4), 
input_partitions=4
 11)--------------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(c_custkey@0, o_custkey@1)], projection=[c_custkey@0, c_name@1, 
c_address@2, c_nationkey@3, c_phone@4, c_acctbal@5, c_comment@6, o_orderkey@7]
-12)----------------------RepartitionExec: partitioning=Hash([c_custkey@0], 4), 
input_partitions=1
-13)------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl]]}, 
projection=[c_custkey, c_name, c_address, c_nationkey, c_phone, c_acctbal, 
c_comment], file_type=csv, has_header=false
+12)----------------------RepartitionExec: partitioning=Hash([c_custkey@0], 4), 
input_partitions=4
+13)------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:0..606529],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:606529..1213058],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1213058..1819587],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1819587..2426114]]},
 projection=[c_custkey, c_name, c_address, c_nationkey, c_ph [...]
 14)----------------------RepartitionExec: partitioning=Hash([o_custkey@1], 4), 
input_partitions=4
 15)------------------------FilterExec: o_orderdate@2 >= 1993-10-01 AND 
o_orderdate@2 < 1994-01-01, projection=[o_orderkey@0, o_custkey@1]
 16)--------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_orderkey, o_custkey, o_orderdate], file_type=c [...]
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q13.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q13.slt.part
index 94e0848bfc..24e23e4dbd 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q13.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q13.slt.part
@@ -62,8 +62,8 @@ physical_plan
 07)------------ProjectionExec: expr=[count(orders.o_orderkey)@1 as c_count]
 08)--------------AggregateExec: mode=SinglePartitioned, gby=[c_custkey@0 as 
c_custkey], aggr=[count(orders.o_orderkey)]
 09)----------------HashJoinExec: mode=Partitioned, join_type=Left, 
on=[(c_custkey@0, o_custkey@1)], projection=[c_custkey@0, o_orderkey@1]
-10)------------------RepartitionExec: partitioning=Hash([c_custkey@0], 4), 
input_partitions=1
-11)--------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl]]}, 
projection=[c_custkey], file_type=csv, has_header=false
+10)------------------RepartitionExec: partitioning=Hash([c_custkey@0], 4), 
input_partitions=4
+11)--------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:0..606529],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:606529..1213058],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1213058..1819587],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1819587..2426114]]},
 projection=[c_custkey], file_type=csv, has_header=false
 12)------------------RepartitionExec: partitioning=Hash([o_custkey@1], 4), 
input_partitions=4
 13)--------------------FilterExec: o_comment@2 NOT LIKE %special%requests%, 
projection=[o_orderkey@0, o_custkey@1]
 14)----------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_orderkey, o_custkey, o_comment], file_type=csv, ha [...]
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q14.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q14.slt.part
index 28c4f99821..baa98e18ad 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q14.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q14.slt.part
@@ -50,5 +50,5 @@ physical_plan
 07)------------RepartitionExec: partitioning=Hash([l_partkey@0], 4), 
input_partitions=4
 08)--------------FilterExec: l_shipdate@3 >= 1995-09-01 AND l_shipdate@3 < 
1995-10-01, projection=[l_partkey@0, l_extendedprice@1, l_discount@2]
 09)----------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_partkey, l_extendedprice, l_discount, l_ship [...]
-10)------------RepartitionExec: partitioning=Hash([p_partkey@0], 4), 
input_partitions=1
-11)--------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl]]}, 
projection=[p_partkey, p_type], file_type=csv, has_header=false
+10)------------RepartitionExec: partitioning=Hash([p_partkey@0], 4), 
input_partitions=4
+11)--------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:0..597773],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:597773..1195546],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1195546..1793319],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1793319..2391090]]},
 projection=[p_partkey, p_type], file_type=csv, has_header=false
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part
index b01110b567..0d5e0c0303 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part
@@ -81,8 +81,7 @@ physical_plan
 14)--------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:0..2932049],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:2932049..5864098],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:5864098..8796147],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:8796147..11728193]]},
 projection=[ps_partkey, ps_suppkey], file_type=csv, ha [...]
 15)------------------------RepartitionExec: partitioning=Hash([p_partkey@0], 
4), input_partitions=4
 16)--------------------------FilterExec: p_brand@1 != Brand#45 AND p_type@2 
NOT LIKE MEDIUM POLISHED% AND p_size@3 IN (SET) ([49, 14, 23, 45, 19, 3, 36, 9])
-17)----------------------------RepartitionExec: 
partitioning=RoundRobinBatch(4), input_partitions=1
-18)------------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl]]}, 
projection=[p_partkey, p_brand, p_type, p_size], file_type=csv, has_header=false
-19)--------------------FilterExec: s_comment@1 LIKE %Customer%Complaints%, 
projection=[s_suppkey@0]
-20)----------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
-21)------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/supplier.tbl]]}, 
projection=[s_suppkey, s_comment], file_type=csv, has_header=false
+17)----------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:0..597773],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:597773..1195546],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1195546..1793319],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1793319..2391090]]},
 projection=[p_partkey, p_brand, p_type, p_size], file_type=csv, has_hea [...]
+18)--------------------FilterExec: s_comment@1 LIKE %Customer%Complaints%, 
projection=[s_suppkey@0]
+19)----------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
+20)------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/supplier.tbl]]}, 
projection=[s_suppkey, s_comment], file_type=csv, has_header=false
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q17.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q17.slt.part
index 83294d61a1..9f375a583f 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q17.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q17.slt.part
@@ -61,10 +61,9 @@ physical_plan
 08)--------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_partkey, l_quantity, l_extendedprice], file_ty [...]
 09)------------RepartitionExec: partitioning=Hash([p_partkey@0], 4), 
input_partitions=4
 10)--------------FilterExec: p_brand@1 = Brand#23 AND p_container@2 = MED BOX, 
projection=[p_partkey@0]
-11)----------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
-12)------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl]]}, 
projection=[p_partkey, p_brand, p_container], file_type=csv, has_header=false
-13)----------ProjectionExec: expr=[CAST(0.2 * CAST(avg(lineitem.l_quantity)@1 
AS Float64) AS Decimal128(30, 15)) as Float64(0.2) * avg(lineitem.l_quantity), 
l_partkey@0 as l_partkey]
-14)------------AggregateExec: mode=FinalPartitioned, gby=[l_partkey@0 as 
l_partkey], aggr=[avg(lineitem.l_quantity)]
-15)--------------RepartitionExec: partitioning=Hash([l_partkey@0], 4), 
input_partitions=4
-16)----------------AggregateExec: mode=Partial, gby=[l_partkey@0 as 
l_partkey], aggr=[avg(lineitem.l_quantity)]
-17)------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_partkey, l_quantity], file_type=csv, has_h [...]
+11)----------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:0..597773],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:597773..1195546],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1195546..1793319],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1793319..2391090]]},
 projection=[p_partkey, p_brand, p_container], file_type=csv, has_header=false
+12)----------ProjectionExec: expr=[CAST(0.2 * CAST(avg(lineitem.l_quantity)@1 
AS Float64) AS Decimal128(30, 15)) as Float64(0.2) * avg(lineitem.l_quantity), 
l_partkey@0 as l_partkey]
+13)------------AggregateExec: mode=FinalPartitioned, gby=[l_partkey@0 as 
l_partkey], aggr=[avg(lineitem.l_quantity)]
+14)--------------RepartitionExec: partitioning=Hash([l_partkey@0], 4), 
input_partitions=4
+15)----------------AggregateExec: mode=Partial, gby=[l_partkey@0 as 
l_partkey], aggr=[avg(lineitem.l_quantity)]
+16)------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_partkey, l_quantity], file_type=csv, has_h [...]
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q18.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q18.slt.part
index 7f63db8f1c..831072092b 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q18.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q18.slt.part
@@ -74,8 +74,8 @@ physical_plan
 05)--------HashJoinExec: mode=Partitioned, join_type=Inner, on=[(o_orderkey@2, 
l_orderkey@0)], projection=[c_custkey@0, c_name@1, o_orderkey@2, 
o_totalprice@3, o_orderdate@4, l_quantity@6]
 06)----------RepartitionExec: partitioning=Hash([o_orderkey@2], 4), 
input_partitions=4
 07)------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(c_custkey@0, o_custkey@1)], projection=[c_custkey@0, c_name@1, 
o_orderkey@2, o_totalprice@4, o_orderdate@5]
-08)--------------RepartitionExec: partitioning=Hash([c_custkey@0], 4), 
input_partitions=1
-09)----------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl]]}, 
projection=[c_custkey, c_name], file_type=csv, has_header=false
+08)--------------RepartitionExec: partitioning=Hash([c_custkey@0], 4), 
input_partitions=4
+09)----------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:0..606529],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:606529..1213058],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1213058..1819587],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1819587..2426114]]},
 projection=[c_custkey, c_name], file_type=csv, has_header=false
 10)--------------RepartitionExec: partitioning=Hash([o_custkey@1], 4), 
input_partitions=4
 11)----------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_orderkey, o_custkey, o_totalprice, o_orderdate], file_ty [...]
 12)----------RepartitionExec: partitioning=Hash([l_orderkey@0], 4), 
input_partitions=4
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q19.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q19.slt.part
index 07a1e9ebfe..03fa6dae94 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q19.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q19.slt.part
@@ -74,5 +74,4 @@ physical_plan
 08)--------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_partkey, l_quantity, l_extendedprice, l_discou [...]
 09)----------RepartitionExec: partitioning=Hash([p_partkey@0], 4), 
input_partitions=4
 10)------------FilterExec: p_size@2 >= 1 AND (p_brand@1 = Brand#12 AND 
p_container@3 IN (SET) ([SM CASE, SM BOX, SM PACK, SM PKG]) AND p_size@2 <= 5 
OR p_brand@1 = Brand#23 AND p_container@3 IN (SET) ([MED BAG, MED BOX, MED PKG, 
MED PACK]) AND p_size@2 <= 10 OR p_brand@1 = Brand#34 AND p_container@3 IN 
(SET) ([LG CASE, LG BOX, LG PACK, LG PKG]) AND p_size@2 <= 15)
-11)--------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
-12)----------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl]]}, 
projection=[p_partkey, p_brand, p_size, p_container], file_type=csv, 
has_header=false
+11)--------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:0..597773],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:597773..1195546],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1195546..1793319],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1793319..2391090]]},
 projection=[p_partkey, p_brand, p_size, p_container], file_type=csv, 
has_header=false
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q2.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q2.slt.part
index b1a1538827..e471c2c23d 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q2.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q2.slt.part
@@ -112,35 +112,34 @@ physical_plan
 11)--------------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(p_partkey@0, ps_partkey@0)], projection=[p_partkey@0, p_mfgr@1, 
ps_suppkey@3, ps_supplycost@4]
 12)----------------------RepartitionExec: partitioning=Hash([p_partkey@0], 4), 
input_partitions=4
 13)------------------------FilterExec: p_size@3 = 15 AND p_type@2 LIKE %BRASS, 
projection=[p_partkey@0, p_mfgr@1]
-14)--------------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
-15)----------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl]]}, 
projection=[p_partkey, p_mfgr, p_type, p_size], file_type=csv, has_header=false
-16)----------------------RepartitionExec: partitioning=Hash([ps_partkey@0], 
4), input_partitions=4
-17)------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:0..2932049],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:2932049..5864098],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:5864098..8796147],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:8796147..11728193]]},
 projection=[ps_partkey, ps_suppkey, ps_supplycost], file [...]
-18)------------------RepartitionExec: partitioning=Hash([s_suppkey@0], 4), 
input_partitions=1
-19)--------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/supplier.tbl]]}, 
projection=[s_suppkey, s_name, s_address, s_nationkey, s_phone, s_acctbal, 
s_comment], file_type=csv, has_header=false
-20)--------------RepartitionExec: partitioning=Hash([n_nationkey@0], 4), 
input_partitions=1
-21)----------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/nation.tbl]]}, 
projection=[n_nationkey, n_name, n_regionkey], file_type=csv, has_header=false
-22)----------RepartitionExec: partitioning=Hash([r_regionkey@0], 4), 
input_partitions=4
-23)------------FilterExec: r_name@1 = EUROPE, projection=[r_regionkey@0]
-24)--------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
-25)----------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/region.tbl]]}, 
projection=[r_regionkey, r_name], file_type=csv, has_header=false
-26)------RepartitionExec: partitioning=Hash([ps_partkey@1, 
min(partsupp.ps_supplycost)@0], 4), input_partitions=4
-27)--------ProjectionExec: expr=[min(partsupp.ps_supplycost)@1 as 
min(partsupp.ps_supplycost), ps_partkey@0 as ps_partkey]
-28)----------AggregateExec: mode=FinalPartitioned, gby=[ps_partkey@0 as 
ps_partkey], aggr=[min(partsupp.ps_supplycost)]
-29)------------RepartitionExec: partitioning=Hash([ps_partkey@0], 4), 
input_partitions=4
-30)--------------AggregateExec: mode=Partial, gby=[ps_partkey@0 as 
ps_partkey], aggr=[min(partsupp.ps_supplycost)]
-31)----------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(n_regionkey@2, r_regionkey@0)], projection=[ps_partkey@0, ps_supplycost@1]
-32)------------------RepartitionExec: partitioning=Hash([n_regionkey@2], 4), 
input_partitions=4
-33)--------------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(s_nationkey@2, n_nationkey@0)], projection=[ps_partkey@0, ps_supplycost@1, 
n_regionkey@4]
-34)----------------------RepartitionExec: partitioning=Hash([s_nationkey@2], 
4), input_partitions=4
-35)------------------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(ps_suppkey@1, s_suppkey@0)], projection=[ps_partkey@0, ps_supplycost@2, 
s_nationkey@4]
-36)--------------------------RepartitionExec: 
partitioning=Hash([ps_suppkey@1], 4), input_partitions=4
-37)----------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:0..2932049],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:2932049..5864098],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:5864098..8796147],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:8796147..11728193]]},
 projection=[ps_partkey, ps_suppkey, ps_supplycost],  [...]
-38)--------------------------RepartitionExec: partitioning=Hash([s_suppkey@0], 
4), input_partitions=1
-39)----------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/supplier.tbl]]}, 
projection=[s_suppkey, s_nationkey], file_type=csv, has_header=false
-40)----------------------RepartitionExec: partitioning=Hash([n_nationkey@0], 
4), input_partitions=1
-41)------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/nation.tbl]]}, 
projection=[n_nationkey, n_regionkey], file_type=csv, has_header=false
-42)------------------RepartitionExec: partitioning=Hash([r_regionkey@0], 4), 
input_partitions=4
-43)--------------------FilterExec: r_name@1 = EUROPE, 
projection=[r_regionkey@0]
-44)----------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
-45)------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/region.tbl]]}, 
projection=[r_regionkey, r_name], file_type=csv, has_header=false
+14)--------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:0..597773],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:597773..1195546],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1195546..1793319],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1793319..2391090]]},
 projection=[p_partkey, p_mfgr, p_type, p_size], file_type=csv, has_header=false
+15)----------------------RepartitionExec: partitioning=Hash([ps_partkey@0], 
4), input_partitions=4
+16)------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:0..2932049],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:2932049..5864098],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:5864098..8796147],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:8796147..11728193]]},
 projection=[ps_partkey, ps_suppkey, ps_supplycost], file [...]
+17)------------------RepartitionExec: partitioning=Hash([s_suppkey@0], 4), 
input_partitions=1
+18)--------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/supplier.tbl]]}, 
projection=[s_suppkey, s_name, s_address, s_nationkey, s_phone, s_acctbal, 
s_comment], file_type=csv, has_header=false
+19)--------------RepartitionExec: partitioning=Hash([n_nationkey@0], 4), 
input_partitions=1
+20)----------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/nation.tbl]]}, 
projection=[n_nationkey, n_name, n_regionkey], file_type=csv, has_header=false
+21)----------RepartitionExec: partitioning=Hash([r_regionkey@0], 4), 
input_partitions=4
+22)------------FilterExec: r_name@1 = EUROPE, projection=[r_regionkey@0]
+23)--------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
+24)----------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/region.tbl]]}, 
projection=[r_regionkey, r_name], file_type=csv, has_header=false
+25)------RepartitionExec: partitioning=Hash([ps_partkey@1, 
min(partsupp.ps_supplycost)@0], 4), input_partitions=4
+26)--------ProjectionExec: expr=[min(partsupp.ps_supplycost)@1 as 
min(partsupp.ps_supplycost), ps_partkey@0 as ps_partkey]
+27)----------AggregateExec: mode=FinalPartitioned, gby=[ps_partkey@0 as 
ps_partkey], aggr=[min(partsupp.ps_supplycost)]
+28)------------RepartitionExec: partitioning=Hash([ps_partkey@0], 4), 
input_partitions=4
+29)--------------AggregateExec: mode=Partial, gby=[ps_partkey@0 as 
ps_partkey], aggr=[min(partsupp.ps_supplycost)]
+30)----------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(n_regionkey@2, r_regionkey@0)], projection=[ps_partkey@0, ps_supplycost@1]
+31)------------------RepartitionExec: partitioning=Hash([n_regionkey@2], 4), 
input_partitions=4
+32)--------------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(s_nationkey@2, n_nationkey@0)], projection=[ps_partkey@0, ps_supplycost@1, 
n_regionkey@4]
+33)----------------------RepartitionExec: partitioning=Hash([s_nationkey@2], 
4), input_partitions=4
+34)------------------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(ps_suppkey@1, s_suppkey@0)], projection=[ps_partkey@0, ps_supplycost@2, 
s_nationkey@4]
+35)--------------------------RepartitionExec: 
partitioning=Hash([ps_suppkey@1], 4), input_partitions=4
+36)----------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:0..2932049],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:2932049..5864098],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:5864098..8796147],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:8796147..11728193]]},
 projection=[ps_partkey, ps_suppkey, ps_supplycost],  [...]
+37)--------------------------RepartitionExec: partitioning=Hash([s_suppkey@0], 
4), input_partitions=1
+38)----------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/supplier.tbl]]}, 
projection=[s_suppkey, s_nationkey], file_type=csv, has_header=false
+39)----------------------RepartitionExec: partitioning=Hash([n_nationkey@0], 
4), input_partitions=1
+40)------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/nation.tbl]]}, 
projection=[n_nationkey, n_regionkey], file_type=csv, has_header=false
+41)------------------RepartitionExec: partitioning=Hash([r_regionkey@0], 4), 
input_partitions=4
+42)--------------------FilterExec: r_name@1 = EUROPE, 
projection=[r_regionkey@0]
+43)----------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
+44)------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/region.tbl]]}, 
projection=[r_regionkey, r_name], file_type=csv, has_header=false
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q20.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q20.slt.part
index 426a1cbaa4..76876160e2 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q20.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q20.slt.part
@@ -100,11 +100,10 @@ physical_plan
 17)----------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:0..2932049],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:2932049..5864098],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:5864098..8796147],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:8796147..11728193]]},
 projection=[ps_partkey, ps_suppkey, ps_availqty], file_type=csv, [...]
 18)--------------RepartitionExec: partitioning=Hash([p_partkey@0], 4), 
input_partitions=4
 19)----------------FilterExec: p_name@1 LIKE forest%, projection=[p_partkey@0]
-20)------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
-21)--------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl]]}, 
projection=[p_partkey, p_name], file_type=csv, has_header=false
-22)----------ProjectionExec: expr=[0.5 * CAST(sum(lineitem.l_quantity)@2 AS 
Float64) as Float64(0.5) * sum(lineitem.l_quantity), l_partkey@0 as l_partkey, 
l_suppkey@1 as l_suppkey]
-23)------------AggregateExec: mode=FinalPartitioned, gby=[l_partkey@0 as 
l_partkey, l_suppkey@1 as l_suppkey], aggr=[sum(lineitem.l_quantity)]
-24)--------------RepartitionExec: partitioning=Hash([l_partkey@0, 
l_suppkey@1], 4), input_partitions=4
-25)----------------AggregateExec: mode=Partial, gby=[l_partkey@0 as l_partkey, 
l_suppkey@1 as l_suppkey], aggr=[sum(lineitem.l_quantity)]
-26)------------------FilterExec: l_shipdate@3 >= 1994-01-01 AND l_shipdate@3 < 
1995-01-01, projection=[l_partkey@0, l_suppkey@1, l_quantity@2]
-27)--------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_partkey, l_suppkey, l_quantity, l_shipda [...]
+20)------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:0..597773],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:597773..1195546],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1195546..1793319],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1793319..2391090]]},
 projection=[p_partkey, p_name], file_type=csv, has_header=false
+21)----------ProjectionExec: expr=[0.5 * CAST(sum(lineitem.l_quantity)@2 AS 
Float64) as Float64(0.5) * sum(lineitem.l_quantity), l_partkey@0 as l_partkey, 
l_suppkey@1 as l_suppkey]
+22)------------AggregateExec: mode=FinalPartitioned, gby=[l_partkey@0 as 
l_partkey, l_suppkey@1 as l_suppkey], aggr=[sum(lineitem.l_quantity)]
+23)--------------RepartitionExec: partitioning=Hash([l_partkey@0, 
l_suppkey@1], 4), input_partitions=4
+24)----------------AggregateExec: mode=Partial, gby=[l_partkey@0 as l_partkey, 
l_suppkey@1 as l_suppkey], aggr=[sum(lineitem.l_quantity)]
+25)------------------FilterExec: l_shipdate@3 >= 1994-01-01 AND l_shipdate@3 < 
1995-01-01, projection=[l_partkey@0, l_suppkey@1, l_quantity@2]
+26)--------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_partkey, l_suppkey, l_quantity, l_shipda [...]
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part
index 86fe402a10..97f017eff2 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part
@@ -83,13 +83,11 @@ physical_plan
 09)----------------HashJoinExec: mode=Partitioned, join_type=LeftAnti, 
on=[(c_custkey@0, o_custkey@0)], projection=[c_phone@1, c_acctbal@2]
 10)------------------RepartitionExec: partitioning=Hash([c_custkey@0], 4), 
input_partitions=4
 11)--------------------FilterExec: substr(c_phone@1, 1, 2) IN (SET) ([13, 31, 
23, 29, 30, 18, 17]) AND CAST(c_acctbal@2 AS Decimal128(19, 6)) > 
scalar_subquery(<pending>)
-12)----------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
-13)------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl]]}, 
projection=[c_custkey, c_phone, c_acctbal], file_type=csv, has_header=false
-14)------------------RepartitionExec: partitioning=Hash([o_custkey@0], 4), 
input_partitions=4
-15)--------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_custkey], file_type=csv, has_header=false
-16)--AggregateExec: mode=Final, gby=[], aggr=[avg(customer.c_acctbal)]
-17)----CoalescePartitionsExec
-18)------AggregateExec: mode=Partial, gby=[], aggr=[avg(customer.c_acctbal)]
-19)--------FilterExec: c_acctbal@1 > 0.00 AND substr(c_phone@0, 1, 2) IN (SET) 
([13, 31, 23, 29, 30, 18, 17]), projection=[c_acctbal@1]
-20)----------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
-21)------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl]]}, 
projection=[c_phone, c_acctbal], file_type=csv, has_header=false
+12)----------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:0..606529],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:606529..1213058],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1213058..1819587],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1819587..2426114]]},
 projection=[c_custkey, c_phone, c_acctbal], file_type=csv, ha [...]
+13)------------------RepartitionExec: partitioning=Hash([o_custkey@0], 4), 
input_partitions=4
+14)--------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_custkey], file_type=csv, has_header=false
+15)--AggregateExec: mode=Final, gby=[], aggr=[avg(customer.c_acctbal)]
+16)----CoalescePartitionsExec
+17)------AggregateExec: mode=Partial, gby=[], aggr=[avg(customer.c_acctbal)]
+18)--------FilterExec: c_acctbal@1 > 0.00 AND substr(c_phone@0, 1, 2) IN (SET) 
([13, 31, 23, 29, 30, 18, 17]), projection=[c_acctbal@1]
+19)----------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:0..606529],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:606529..1213058],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1213058..1819587],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1819587..2426114]]},
 projection=[c_phone, c_acctbal], file_type=csv, has_header=false
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q3.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q3.slt.part
index a9b6ab13cc..fa2cd60688 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q3.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q3.slt.part
@@ -67,11 +67,10 @@ physical_plan
 07)------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(c_custkey@0, o_custkey@1)], projection=[o_orderkey@1, o_orderdate@3, 
o_shippriority@4]
 08)--------------RepartitionExec: partitioning=Hash([c_custkey@0], 4), 
input_partitions=4
 09)----------------FilterExec: c_mktsegment@1 = BUILDING, 
projection=[c_custkey@0]
-10)------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
-11)--------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl]]}, 
projection=[c_custkey, c_mktsegment], file_type=csv, has_header=false
-12)--------------RepartitionExec: partitioning=Hash([o_custkey@1], 4), 
input_partitions=4
-13)----------------FilterExec: o_orderdate@2 < 1995-03-15
-14)------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_orderkey, o_custkey, o_orderdate, o_shippriority], fil [...]
-15)----------RepartitionExec: partitioning=Hash([l_orderkey@0], 4), 
input_partitions=4
-16)------------FilterExec: l_shipdate@3 > 1995-03-15, 
projection=[l_orderkey@0, l_extendedprice@1, l_discount@2]
-17)--------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_orderkey, l_extendedprice, l_discount, l_shipd [...]
+10)------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:0..606529],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:606529..1213058],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1213058..1819587],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1819587..2426114]]},
 projection=[c_custkey, c_mktsegment], file_type=csv, has_header=false
+11)--------------RepartitionExec: partitioning=Hash([o_custkey@1], 4), 
input_partitions=4
+12)----------------FilterExec: o_orderdate@2 < 1995-03-15
+13)------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_orderkey, o_custkey, o_orderdate, o_shippriority], fil [...]
+14)----------RepartitionExec: partitioning=Hash([l_orderkey@0], 4), 
input_partitions=4
+15)------------FilterExec: l_shipdate@3 > 1995-03-15, 
projection=[l_orderkey@0, l_extendedprice@1, l_discount@2]
+16)--------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_orderkey, l_extendedprice, l_discount, l_shipd [...]
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q5.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q5.slt.part
index 12a80b8dd2..6cbc9c4bef 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q5.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q5.slt.part
@@ -82,8 +82,8 @@ physical_plan
 13)------------------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(o_orderkey@1, l_orderkey@0)], projection=[c_nationkey@0, l_suppkey@3, 
l_extendedprice@4, l_discount@5]
 14)--------------------------RepartitionExec: 
partitioning=Hash([o_orderkey@1], 4), input_partitions=4
 15)----------------------------HashJoinExec: mode=Partitioned, 
join_type=Inner, on=[(c_custkey@0, o_custkey@1)], projection=[c_nationkey@1, 
o_orderkey@2]
-16)------------------------------RepartitionExec: 
partitioning=Hash([c_custkey@0], 4), input_partitions=1
-17)--------------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl]]}, 
projection=[c_custkey, c_nationkey], file_type=csv, has_header=false
+16)------------------------------RepartitionExec: 
partitioning=Hash([c_custkey@0], 4), input_partitions=4
+17)--------------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:0..606529],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:606529..1213058],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1213058..1819587],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1819587..2426114]]},
 projection=[c_custkey, c_nationkey], file_type=csv, [...]
 18)------------------------------RepartitionExec: 
partitioning=Hash([o_custkey@1], 4), input_partitions=4
 19)--------------------------------FilterExec: o_orderdate@2 >= 1994-01-01 AND 
o_orderdate@2 < 1995-01-01, projection=[o_orderkey@0, o_custkey@1]
 20)----------------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_orderkey, o_custkey, o_orderdate], fil [...]
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q7.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q7.slt.part
index c20afc5283..4bcb738d62 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q7.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q7.slt.part
@@ -107,8 +107,8 @@ physical_plan
 21)------------------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_orderkey, l_suppkey, l_e [...]
 22)----------------------------RepartitionExec: 
partitioning=Hash([o_orderkey@0], 4), input_partitions=4
 23)------------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_orderkey, o_custkey], file_type=csv, has_h [...]
-24)------------------------RepartitionExec: partitioning=Hash([c_custkey@0], 
4), input_partitions=1
-25)--------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl]]}, 
projection=[c_custkey, c_nationkey], file_type=csv, has_header=false
+24)------------------------RepartitionExec: partitioning=Hash([c_custkey@0], 
4), input_partitions=4
+25)--------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:0..606529],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:606529..1213058],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1213058..1819587],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1819587..2426114]]},
 projection=[c_custkey, c_nationkey], file_type=csv, has_h [...]
 26)--------------------RepartitionExec: partitioning=Hash([n_nationkey@0], 4), 
input_partitions=4
 27)----------------------FilterExec: n_name@1 = FRANCE OR n_name@1 = GERMANY
 28)------------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q8.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q8.slt.part
index 17faf3c12e..189d501ce2 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q8.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q8.slt.part
@@ -112,22 +112,21 @@ physical_plan
 20)--------------------------------------HashJoinExec: mode=Partitioned, 
join_type=Inner, on=[(p_partkey@0, l_partkey@1)], projection=[l_orderkey@1, 
l_suppkey@3, l_extendedprice@4, l_discount@5]
 21)----------------------------------------RepartitionExec: 
partitioning=Hash([p_partkey@0], 4), input_partitions=4
 22)------------------------------------------FilterExec: p_type@1 = ECONOMY 
ANODIZED STEEL, projection=[p_partkey@0]
-23)--------------------------------------------RepartitionExec: 
partitioning=RoundRobinBatch(4), input_partitions=1
-24)----------------------------------------------DataSourceExec: 
file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl]]}, 
projection=[p_partkey, p_type], file_type=csv, has_header=false
-25)----------------------------------------RepartitionExec: 
partitioning=Hash([l_partkey@1], 4), input_partitions=4
-26)------------------------------------------DataSourceExec: file_groups={4 
groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_orderkey, l_partke [...]
-27)------------------------------------RepartitionExec: 
partitioning=Hash([s_suppkey@0], 4), input_partitions=1
-28)--------------------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/supplier.tbl]]}, 
projection=[s_suppkey, s_nationkey], file_type=csv, has_header=false
-29)--------------------------------RepartitionExec: 
partitioning=Hash([o_orderkey@0], 4), input_partitions=4
-30)----------------------------------FilterExec: o_orderdate@2 >= 1995-01-01 
AND o_orderdate@2 <= 1996-12-31
-31)------------------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_orderkey, o_custkey, o_orderdate], f [...]
-32)----------------------------RepartitionExec: 
partitioning=Hash([c_custkey@0], 4), input_partitions=1
-33)------------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl]]}, 
projection=[c_custkey, c_nationkey], file_type=csv, has_header=false
-34)------------------------RepartitionExec: partitioning=Hash([n_nationkey@0], 
4), input_partitions=1
-35)--------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/nation.tbl]]}, 
projection=[n_nationkey, n_regionkey], file_type=csv, has_header=false
-36)--------------------RepartitionExec: partitioning=Hash([n_nationkey@0], 4), 
input_partitions=1
-37)----------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/nation.tbl]]}, 
projection=[n_nationkey, n_name], file_type=csv, has_header=false
-38)----------------RepartitionExec: partitioning=Hash([r_regionkey@0], 4), 
input_partitions=4
-39)------------------FilterExec: r_name@1 = AMERICA, projection=[r_regionkey@0]
-40)--------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
-41)----------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/region.tbl]]}, 
projection=[r_regionkey, r_name], file_type=csv, has_header=false
+23)--------------------------------------------DataSourceExec: file_groups={4 
groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:0..597773],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:597773..1195546],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1195546..1793319],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1793319..2391090]]},
 projection=[p_partkey, p_type], file_type=csv, has_head [...]
+24)----------------------------------------RepartitionExec: 
partitioning=Hash([l_partkey@1], 4), input_partitions=4
+25)------------------------------------------DataSourceExec: file_groups={4 
groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_orderkey, l_partke [...]
+26)------------------------------------RepartitionExec: 
partitioning=Hash([s_suppkey@0], 4), input_partitions=1
+27)--------------------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/supplier.tbl]]}, 
projection=[s_suppkey, s_nationkey], file_type=csv, has_header=false
+28)--------------------------------RepartitionExec: 
partitioning=Hash([o_orderkey@0], 4), input_partitions=4
+29)----------------------------------FilterExec: o_orderdate@2 >= 1995-01-01 
AND o_orderdate@2 <= 1996-12-31
+30)------------------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_orderkey, o_custkey, o_orderdate], f [...]
+31)----------------------------RepartitionExec: 
partitioning=Hash([c_custkey@0], 4), input_partitions=4
+32)------------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:0..606529],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:606529..1213058],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1213058..1819587],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/customer.tbl:1819587..2426114]]},
 projection=[c_custkey, c_nationkey], file_type=csv, h [...]
+33)------------------------RepartitionExec: partitioning=Hash([n_nationkey@0], 
4), input_partitions=1
+34)--------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/nation.tbl]]}, 
projection=[n_nationkey, n_regionkey], file_type=csv, has_header=false
+35)--------------------RepartitionExec: partitioning=Hash([n_nationkey@0], 4), 
input_partitions=1
+36)----------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/nation.tbl]]}, 
projection=[n_nationkey, n_name], file_type=csv, has_header=false
+37)----------------RepartitionExec: partitioning=Hash([r_regionkey@0], 4), 
input_partitions=4
+38)------------------FilterExec: r_name@1 = AMERICA, projection=[r_regionkey@0]
+39)--------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
+40)----------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/region.tbl]]}, 
projection=[r_regionkey, r_name], file_type=csv, has_header=false
diff --git a/datafusion/sqllogictest/test_files/tpch/plans/q9.slt.part 
b/datafusion/sqllogictest/test_files/tpch/plans/q9.slt.part
index 1b01d02328..84b8e6fffd 100644
--- a/datafusion/sqllogictest/test_files/tpch/plans/q9.slt.part
+++ b/datafusion/sqllogictest/test_files/tpch/plans/q9.slt.part
@@ -93,15 +93,14 @@ physical_plan
 16)------------------------------HashJoinExec: mode=Partitioned, 
join_type=Inner, on=[(p_partkey@0, l_partkey@1)], projection=[l_orderkey@1, 
l_partkey@2, l_suppkey@3, l_quantity@4, l_extendedprice@5, l_discount@6]
 17)--------------------------------RepartitionExec: 
partitioning=Hash([p_partkey@0], 4), input_partitions=4
 18)----------------------------------FilterExec: p_name@1 LIKE %green%, 
projection=[p_partkey@0]
-19)------------------------------------RepartitionExec: 
partitioning=RoundRobinBatch(4), input_partitions=1
-20)--------------------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl]]}, 
projection=[p_partkey, p_name], file_type=csv, has_header=false
-21)--------------------------------RepartitionExec: 
partitioning=Hash([l_partkey@1], 4), input_partitions=4
-22)----------------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_orderkey, l_partkey, l_sup [...]
-23)----------------------------RepartitionExec: 
partitioning=Hash([s_suppkey@0], 4), input_partitions=1
-24)------------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/supplier.tbl]]}, 
projection=[s_suppkey, s_nationkey], file_type=csv, has_header=false
-25)------------------------RepartitionExec: partitioning=Hash([ps_suppkey@1, 
ps_partkey@0], 4), input_partitions=4
-26)--------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:0..2932049],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:2932049..5864098],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:5864098..8796147],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:8796147..11728193]]},
 projection=[ps_partkey, ps_suppkey, ps_supplycost], fi [...]
-27)--------------------RepartitionExec: partitioning=Hash([o_orderkey@0], 4), 
input_partitions=4
-28)----------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_orderkey, o_orderdate], file_type=csv, has_header=false
-29)----------------RepartitionExec: partitioning=Hash([n_nationkey@0], 4), 
input_partitions=1
-30)------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/nation.tbl]]}, 
projection=[n_nationkey, n_name], file_type=csv, has_header=false
+19)------------------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:0..597773],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:597773..1195546],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1195546..1793319],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl:1793319..2391090]]},
 projection=[p_partkey, p_name], file_type=csv, has_header=false
+20)--------------------------------RepartitionExec: 
partitioning=Hash([l_partkey@1], 4), input_partitions=4
+21)----------------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_orderkey, l_partkey, l_sup [...]
+22)----------------------------RepartitionExec: 
partitioning=Hash([s_suppkey@0], 4), input_partitions=1
+23)------------------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/supplier.tbl]]}, 
projection=[s_suppkey, s_nationkey], file_type=csv, has_header=false
+24)------------------------RepartitionExec: partitioning=Hash([ps_suppkey@1, 
ps_partkey@0], 4), input_partitions=4
+25)--------------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:0..2932049],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:2932049..5864098],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:5864098..8796147],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/partsupp.tbl:8796147..11728193]]},
 projection=[ps_partkey, ps_suppkey, ps_supplycost], fi [...]
+26)--------------------RepartitionExec: partitioning=Hash([o_orderkey@0], 4), 
input_partitions=4
+27)----------------------DataSourceExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:0..4223281],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:4223281..8446562],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:8446562..12669843],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/orders.tbl:12669843..16893122]]},
 projection=[o_orderkey, o_orderdate], file_type=csv, has_header=false
+28)----------------RepartitionExec: partitioning=Hash([n_nationkey@0], 4), 
input_partitions=1
+29)------------------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/nation.tbl]]}, 
projection=[n_nationkey, n_name], file_type=csv, has_header=false
diff --git a/docs/source/user-guide/configs.md 
b/docs/source/user-guide/configs.md
index 576137bda2..9856a13f00 100644
--- a/docs/source/user-guide/configs.md
+++ b/docs/source/user-guide/configs.md
@@ -149,7 +149,7 @@ The following configuration settings are available:
 | datafusion.optimizer.enable_dynamic_filter_pushdown                     | 
true                      | When set to true attempts to push down dynamic 
filters generated by operators (TopK, Join & Aggregate) into the file scan 
phase. For example, for a query such as `SELECT * FROM t ORDER BY timestamp 
DESC LIMIT 10`, the optimizer will attempt to push down the current top 10 
timestamps that the TopK operator references into the file scans. This means 
that if we already have 10 timestamps  [...]
 | datafusion.optimizer.filter_null_join_keys                              | 
false                     | When set to true, the optimizer will insert filters 
before a join between a nullable and non-nullable column to filter out nulls on 
the nullable side. This filter can add additional overhead when the file format 
does not fully support predicate push down.                                     
                                                                                
                 [...]
 | datafusion.optimizer.repartition_aggregations                           | 
true                      | Should DataFusion repartition data using the 
aggregate keys to execute aggregates in parallel using the provided 
`target_partitions` level                                                       
                                                                                
                                                                                
                                    [...]
-| datafusion.optimizer.repartition_file_min_size                          | 
10485760                  | Minimum total files size in bytes to perform file 
scan repartitioning.                                                            
                                                                                
                                                                                
                                                                                
                   [...]
+| datafusion.optimizer.repartition_file_min_size                          | 
1048576                   | Minimum total file size in bytes for file-group 
byte-range splitting to fire. Files (or merged file groups) smaller than this 
stay as one partition. Lower values produce more, smaller partitions — better 
at filling `target_partitions` worth of cores when files are modestly sized, at 
the cost of slightly more per-partition open / metadata-load overhead.          
                         [...]
 | datafusion.optimizer.repartition_joins                                  | 
true                      | Should DataFusion repartition data using the join 
keys to execute joins in parallel using the provided `target_partitions` level  
                                                                                
                                                                                
                                                                                
                   [...]
 | datafusion.optimizer.allow_symmetric_joins_without_pruning              | 
true                      | Should DataFusion allow symmetric hash joins for 
unbounded data sources even when its inputs do not have any ordering or 
filtering If the flag is not enabled, the SymmetricHashJoin operator will be 
unable to prune its internal buffers, resulting in certain join types - such as 
Full, Left, LeftAnti, LeftSemi, Right, RightAnti, and RightSemi - being 
produced only at the end of the execut [...]
 | datafusion.optimizer.repartition_file_scans                             | 
true                      | When set to `true`, datasource partitions will be 
repartitioned to achieve maximum parallelism. This applies to both in-memory 
partitions and FileSource's file groups (1 group is 1 partition). For 
FileSources, only Parquet and CSV formats are currently supported. If set to 
`true` for a FileSource, all files will be repartitioned evenly (i.e., a single 
large file might be partitioned in [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to