Andy Grove created ARROW-10583:
----------------------------------

             Summary: [Rust] [DataFusion] Implement "coalesce partitions" 
operator
                 Key: ARROW-10583
                 URL: https://issues.apache.org/jira/browse/ARROW-10583
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Rust, Rust - DataFusion
            Reporter: Andy Grove


The coalesce partitions operator simply reduces the number of partitions to the 
specified amount.

The target partition count must be >=1

If the target partition count is >= the number of input partitions then this is 
a no-op and can be optimized out of the plan.

The simplest implementation would be to assign one or more input partitions to 
each output partition. This works well where the number of input partitions is 
divisible by the number of output partitions e.g. going from 64 input 
partitions to 8 output partitions. In other cases, the resulting partitions may 
have data skew e.g. going from 3 partitions to 2. It would be possible to do 
the partitioning at the row level but that would add a lot of overhead and the 
"repartition" operator should be used for that case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to