[ 
https://issues.apache.org/jira/browse/SPARK-32925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200023#comment-17200023
 ] 

qingwu.fu commented on SPARK-32925:
-----------------------------------

Should send data to remote shuffle servioce bypass sort and spill data on local 
node?Because the process of data belongs to same partition gathered to same 
node can  take the place of sort on local node.

 

> Support push-based shuffle in multiple deployment environments
> --------------------------------------------------------------
>
>                 Key: SPARK-32925
>                 URL: https://issues.apache.org/jira/browse/SPARK-32925
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Min Shen
>            Priority: Major
>
> Create this ticket outside of SPARK-30602, since this is outside of the scope 
> of the immediate deliverables in that SPIP. Want to use this ticket to 
> discuss more about how to further improve push-based shuffle in different 
> environments.
> The tasks created under SPARK-30602 would enable push-based shuffle on YARN 
> in a compute/storage colocated cluster. However, there are other deployment 
> environments that are getting more popular these days. We have seen 2 as we 
> discussed with other community members on the idea of push-based shuffle:
>  * Spark on K8S in a compute/storage colocated cluster. Because of the 
> limitation of concurrency of read/write of a mounted volume in K8S, multiple 
> executor pods on the same node in a K8S cluster cannot concurrently access 
> the same mounted disk volume. This creates some different requirements for 
> supporting external shuffle service as well as push-based shuffle.
>  * Spark on a compute/storage disaggregate cluster. Such a setup is more 
> typical in cloud environments, where the compute cluster has little/no local 
> storage, and the shuffle intermediate data needs to be stored in remote 
> disaggregate storage cluster.
> Want to use this ticket to discuss ways to support push-based shuffle in 
> these different deployment environments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to