[
https://issues.apache.org/jira/browse/ARROW-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Lamb updated ARROW-10583:
--------------------------------
Component/s: (was: Rust)
> [Rust] [DataFusion] Implement "coalesce partitions" operator
> ------------------------------------------------------------
>
> Key: ARROW-10583
> URL: https://issues.apache.org/jira/browse/ARROW-10583
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Rust - DataFusion
> Reporter: Andy Grove
> Priority: Major
>
> The coalesce partitions operator simply reduces the number of partitions to
> the specified amount.
> The target partition count must be >=1
> If the target partition count is >= the number of input partitions then this
> is a no-op and can be optimized out of the plan.
> The simplest implementation would be to assign one or more input partitions
> to each output partition. This works well where the number of input
> partitions is divisible by the number of output partitions e.g. going from 64
> input partitions to 8 output partitions. In other cases, the resulting
> partitions may have data skew e.g. going from 3 partitions to 2. It would be
> possible to do the partitioning at the row level but that would add a lot of
> overhead and the "repartition" operator should be used for that case.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)