[ https://issues.apache.org/jira/browse/SPARK-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron Davidson closed SPARK-4238. --------------------------------- Resolution: Duplicate > Perform network-level retry of shuffle file fetches > --------------------------------------------------- > > Key: SPARK-4238 > URL: https://issues.apache.org/jira/browse/SPARK-4238 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Aaron Davidson > Assignee: Aaron Davidson > Priority: Critical > > During periods of high network (or GC) load, it is not uncommon that > IOExceptions crop up around connection failures when fetching shuffle files. > Unfortunately, when such a failure occurs, it is interpreted as an inability > to fetch the files, which causes us to mark the executor as lost and > recompute all of its shuffle outputs. > We should allow retrying at the network level in the event of an IOException > in order to avoid this circumstance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org