[GitHub] [arrow-datafusion] Dandandan commented on pull request #1143: Add output_partitions_size for CoalescePartitionsExec

GitBox Mon, 18 Oct 2021 22:25:30 -0700


Dandandan commented on pull request #1143:
URL: https://github.com/apache/arrow-datafusion/pull/1143#issuecomment-946377551



   In Spark, repartition is using `coalesce` by setting parameter 
`shuffle=true`.
   I think it might be cleaner to keep the `Repartition` and 
`CoalescePartitions separated, otherwise you get two implementations in the 
same code without too much sharing?
   
   For implementing `CoalescePartitionsExec` we just have to have a scheme that 
combines partitions within `execute` (e.g. when reducing the number of 
partitions from 8 to 4 we can return partitions 0,1 for `execute(0)` 2,3 for 
`execute(1)` etc.
   For Ballista, we have to now (explicitly or implicitly) what partitions are 
living on what node to avoid shuffles.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on pull request #1143: Add output_partitions_size for CoalescePartitionsExec

Reply via email to