Dandandan commented on pull request #1143: URL: https://github.com/apache/arrow-datafusion/pull/1143#issuecomment-946377551
In Spark, repartition is using `coalesce` by setting parameter `shuffle=true`. I think it might be cleaner to keep the `Repartition` and `CoalescePartitions separated, otherwise you get two implementations in the same code without too much sharing? For implementing `CoalescePartitionsExec` we just have to have a scheme that combines partitions within `execute` (e.g. when reducing the number of partitions from 8 to 4 we can return partitions 0,1 for `execute(0)` 2,3 for `execute(1)` etc. For Ballista, we have to now (explicitly or implicitly) what partitions are living on what node to avoid shuffles. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
