[ 
https://issues.apache.org/jira/browse/ARROW-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-10583:
--------------------------------
    Component/s:     (was: Rust)

> [Rust] [DataFusion] Implement "coalesce partitions" operator
> ------------------------------------------------------------
>
>                 Key: ARROW-10583
>                 URL: https://issues.apache.org/jira/browse/ARROW-10583
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Rust - DataFusion
>            Reporter: Andy Grove
>            Priority: Major
>
> The coalesce partitions operator simply reduces the number of partitions to 
> the specified amount.
> The target partition count must be >=1
> If the target partition count is >= the number of input partitions then this 
> is a no-op and can be optimized out of the plan.
> The simplest implementation would be to assign one or more input partitions 
> to each output partition. This works well where the number of input 
> partitions is divisible by the number of output partitions e.g. going from 64 
> input partitions to 8 output partitions. In other cases, the resulting 
> partitions may have data skew e.g. going from 3 partitions to 2. It would be 
> possible to do the partitioning at the row level but that would add a lot of 
> overhead and the "repartition" operator should be used for that case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to