[ https://issues.apache.org/jira/browse/SPARK-44635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bo Zhang updated SPARK-44635: ----------------------------- Description: Spark's decommission feature supports migration of shuffle data. However shuffle data fetcher will only look at the location (`BlockManagerId`) when it is initialized. This can lead to shuffle fetch failures when the shuffle read tasks are long. To mitigate this, shuffle data fetchers should be able to look for the updated locations after decommissions, and fetch from there instead. was:Spark's decommission feature supports migration of shuffle data. However shuffle data fetcher will only look at the location (`BlockManagerId`) when it is initialized. This can lead to shuffle fetch failures when the shuffle read tasks are long. > Handle shuffle fetch failures in decommissions > ---------------------------------------------- > > Key: SPARK-44635 > URL: https://issues.apache.org/jira/browse/SPARK-44635 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 4.0.0 > Reporter: Bo Zhang > Priority: Major > > Spark's decommission feature supports migration of shuffle data. However > shuffle data fetcher will only look at the location (`BlockManagerId`) when > it is initialized. This can lead to shuffle fetch failures when the shuffle > read tasks are long. > > To mitigate this, shuffle data fetchers should be able to look for the > updated locations after decommissions, and fetch from there instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org