[jira] [Commented] (SPARK-24346) Executors are unable to fetch remote cache blocks

Mohamed Mehdi BEN AISSA (JIRA) Sun, 03 Mar 2019 06:03:19 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782739#comment-16782739
 ]


Mohamed Mehdi BEN AISSA commented on SPARK-24346:
-------------------------------------------------

Many thanks [~kien_truong] !  

Speculation also can resolve the issue but as you said, we have to find the 
root cause of this issue..

 

> Executors are unable to fetch remote cache blocks
> -------------------------------------------------
>
>                 Key: SPARK-24346
>                 URL: https://issues.apache.org/jira/browse/SPARK-24346
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, Spark Core
>    Affects Versions: 2.3.0
>         Environment: OS: Centos 7.3
> Cluster: Hortonwork HDP 2.6.5 with Spark 2.3.0
>            Reporter: Truong Duc Kien
>            Priority: Major
>
> After we upgrade from Spark 2.2.1 to Spark 2.3.0, our Spark jobs took a 
> massive performance hit because executors become unable to fetch remote cache 
> block from each others. The scenario is:
> 1. An executor creates a connection and sends a ChunkFetchRequest message to 
> another executor. 
> 2. This request arrives at the target executor, which sends back a 
> ChunkFetchSuccess response
> 3. The ChunkFetchSuccess msg never arrives.
> 4. The connection between these two executors is killed by the originating 
> executor after 120s of idleness. At the same time, the other executor report 
> that it failed to send the ChunkFetchSuccess because the pipe is closed.
> This process repeats itself 3 times, delaying our jobs by 6 minutes, then the 
> originating executor decides to stop fetching and calculates the block by 
> itself and the job can continue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-24346) Executors are unable to fetch remote cache blocks

Reply via email to