[ 
https://issues.apache.org/jira/browse/SPARK-20640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20640:
------------------------------------

    Assignee:     (was: Apache Spark)

> Make rpc timeout and retry for shuffle registration configurable
> ----------------------------------------------------------------
>
>                 Key: SPARK-20640
>                 URL: https://issues.apache.org/jira/browse/SPARK-20640
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 2.0.2
>            Reporter: Sital Kedia
>
> Currently the shuffle service registration timeout and retry has been 
> hardcoded (see 
> https://github.com/sitalkedia/spark/blob/master/network/shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleClient.java#L144
>  and 
> https://github.com/sitalkedia/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L197).
>  This works well for small workloads but under heavy workload when the 
> shuffle service is busy transferring large amount of data we see significant 
> delay in responding to the registration request, as a result we often see the 
> executors fail to register with the shuffle service, eventually failing the 
> job. We need to make these two parameters configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to