GitHub user Samyak2 added a comment to the discussion: `repartitioned` method 
doesn't work for changing partitioning of a subtree after planning

You're right! In this example, it is due to the file sizes. But in my actual 
use case, it's getting skipped due to the range check. I chose this dataset for 
the example because it's easily available in the same repo. Let me try finding 
a better example that hits the range check.

But even in this example, there is one problem. With target_partitions set to 
4, I get 4 file groups as expected.
```rust
    let config = SessionConfig::new()
        .with_target_partitions(4);
```

```
plan: CoalescePartitionsExec: fetch=100
  AggregateExec: mode=FinalPartitioned, gby=[id@0 as id], aggr=[min(Int64(0))]
    CoalesceBatchesExec: target_batch_size=8192
      RepartitionExec: partitioning=Hash([id@0], 4), input_partitions=4
        AggregateExec: mode=Partial, gby=[id@0 as id], aggr=[min(Int64(0))]
          DataSourceExec: file_groups={4 groups: 
[[<path>/datafusion/core/tests/data/test_statistics_per_partition/date=2025-03-01/j5fUeSDQo22oPyPU.parquet],
 
[<path>/datafusion/core/tests/data/test_statistics_per_partition/date=2025-03-02/j5fUeSDQo22oPyPU.parquet],
 
[<path>/datafusion/core/tests/data/test_statistics_per_partition/date=2025-03-03/j5fUeSDQo22oPyPU.parquet],
 
[<path>/datafusion/core/tests/data/test_statistics_per_partition/date=2025-03-04/j5fUeSDQo22oPyPU.parquet]]},
 projection=[id], file_type=parquet
```

I would have expected setting `target_partitions` to have the same behaviour as 
calling `repartitioned`. That doesn't seem to be the case.

GitHub link: 
https://github.com/apache/datafusion/discussions/18924#discussioncomment-15076347

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to