[jira] [Updated] (SPARK-44635) Handle shuffle fetch failures in decommissions

Bo Zhang (Jira) Wed, 02 Aug 2023 02:38:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-44635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bo Zhang updated SPARK-44635:
-----------------------------
    Description: 
Spark's decommission feature supports migration of shuffle data. However 
shuffle data fetcher will only look at the location (`BlockManagerId`) when it 
is initialized. This can lead to shuffle fetch failures when the shuffle read 
tasks are long.

 

To mitigate this, shuffle data fetchers should be able to look for the updated 
locations after decommissions, and fetch from there instead.

  was:Spark's decommission feature supports migration of shuffle data. However 
shuffle data fetcher will only look at the location (`BlockManagerId`) when it 
is initialized. This can lead to shuffle fetch failures when the shuffle read 
tasks are long.


> Handle shuffle fetch failures in decommissions
> ----------------------------------------------
>
>                 Key: SPARK-44635
>                 URL: https://issues.apache.org/jira/browse/SPARK-44635
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 4.0.0
>            Reporter: Bo Zhang
>            Priority: Major
>
> Spark's decommission feature supports migration of shuffle data. However 
> shuffle data fetcher will only look at the location (`BlockManagerId`) when 
> it is initialized. This can lead to shuffle fetch failures when the shuffle 
> read tasks are long.
>  
> To mitigate this, shuffle data fetchers should be able to look for the 
> updated locations after decommissions, and fetch from there instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-44635) Handle shuffle fetch failures in decommissions

Reply via email to