[ 
https://issues.apache.org/jira/browse/SPARK-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782723#comment-16782723
 ] 

Truong Duc Kien commented on SPARK-24346:
-----------------------------------------

We never got to find out the cause of this problem.

But it doesn't seem to happen it you cache the data to disk instead of memory
{code:java}
// use
df.persist(StorageLevel.DISK_ONLY)
// instead of
df.cache(){code}
 

For some job, we just disable caching altogether. Caching can actually slow 
down some jobs due to reduced concurrency.

> Executors are unable to fetch remote cache blocks
> -------------------------------------------------
>
>                 Key: SPARK-24346
>                 URL: https://issues.apache.org/jira/browse/SPARK-24346
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, Spark Core
>    Affects Versions: 2.3.0
>         Environment: OS: Centos 7.3
> Cluster: Hortonwork HDP 2.6.5 with Spark 2.3.0
>            Reporter: Truong Duc Kien
>            Priority: Major
>
> After we upgrade from Spark 2.2.1 to Spark 2.3.0, our Spark jobs took a 
> massive performance hit because executors become unable to fetch remote cache 
> block from each others. The scenario is:
> 1. An executor creates a connection and sends a ChunkFetchRequest message to 
> another executor. 
> 2. This request arrives at the target executor, which sends back a 
> ChunkFetchSuccess response
> 3. The ChunkFetchSuccess msg never arrives.
> 4. The connection between these two executors is killed by the originating 
> executor after 120s of idleness. At the same time, the other executor report 
> that it failed to send the ChunkFetchSuccess because the pipe is closed.
> This process repeats itself 3 times, delaying our jobs by 6 minutes, then the 
> originating executor decides to stop fetching and calculates the block by 
> itself and the job can continue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to