Andy Grove created ARROW-10583:
----------------------------------
Summary: [Rust] [DataFusion] Implement "coalesce partitions"
operator
Key: ARROW-10583
URL: https://issues.apache.org/jira/browse/ARROW-10583
Project: Apache Arrow
Issue Type: New Feature
Components: Rust, Rust - DataFusion
Reporter: Andy Grove
The coalesce partitions operator simply reduces the number of partitions to the
specified amount.
The target partition count must be >=1
If the target partition count is >= the number of input partitions then this is
a no-op and can be optimized out of the plan.
The simplest implementation would be to assign one or more input partitions to
each output partition. This works well where the number of input partitions is
divisible by the number of output partitions e.g. going from 64 input
partitions to 8 output partitions. In other cases, the resulting partitions may
have data skew e.g. going from 3 partitions to 2. It would be possible to do
the partitioning at the row level but that would add a lot of overhead and the
"repartition" operator should be used for that case.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)