GitHub user Samyak2 added a comment to the discussion: `repartitioned` method
doesn't work for changing partitioning of a subtree after planning
You're right! In this example, it is due to the file sizes. But in my actual
use case, it's getting skipped due to the range check. I chose this dataset for
the example because it's easily available in the same repo. Let me try finding
a better example that hits the range check.
But even in this example, there is one problem. With target_partitions set to
4, I get 4 file groups as expected.
```rust
let config = SessionConfig::new()
.with_target_partitions(4);
```
```
plan: CoalescePartitionsExec: fetch=100
AggregateExec: mode=FinalPartitioned, gby=[id@0 as id], aggr=[min(Int64(0))]
CoalesceBatchesExec: target_batch_size=8192
RepartitionExec: partitioning=Hash([id@0], 4), input_partitions=4
AggregateExec: mode=Partial, gby=[id@0 as id], aggr=[min(Int64(0))]
DataSourceExec: file_groups={4 groups:
[[<path>/datafusion/core/tests/data/test_statistics_per_partition/date=2025-03-01/j5fUeSDQo22oPyPU.parquet],
[<path>/datafusion/core/tests/data/test_statistics_per_partition/date=2025-03-02/j5fUeSDQo22oPyPU.parquet],
[<path>/datafusion/core/tests/data/test_statistics_per_partition/date=2025-03-03/j5fUeSDQo22oPyPU.parquet],
[<path>/datafusion/core/tests/data/test_statistics_per_partition/date=2025-03-04/j5fUeSDQo22oPyPU.parquet]]},
projection=[id], file_type=parquet
```
I would have expected setting `target_partitions` to have the same behaviour as
calling `repartitioned`. That doesn't seem to be the case.
GitHub link:
https://github.com/apache/datafusion/discussions/18924#discussioncomment-15076347
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]