Shivaram Venkataraman created SPARK-2563: --------------------------------------------
Summary: Make number of connection retries configurable Key: SPARK-2563 URL: https://issues.apache.org/jira/browse/SPARK-2563 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Shivaram Venkataraman Priority: Minor In a large EC2 cluster, I often see the first shuffle stage in a job fail due to connection timeout exceptions. We should make the number of retries before failing configurable to handle these cases. -- This message was sent by Atlassian JIRA (v6.2#6252)