Renxia Wang created SPARK-16830:
-----------------------------------

             Summary: Executors Keep Trying to Fetch Blocks from a Bad Host
                 Key: SPARK-16830
                 URL: https://issues.apache.org/jira/browse/SPARK-16830
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, Streaming
    Affects Versions: 1.6.2
         Environment: EMR 4.7.2
            Reporter: Renxia Wang


When a host became unreachable, driver removes the executors and block managers 
on that hosts because it doesn't receive heartbeats. However, executors on 
other hosts still keep trying to fetch blocks from the bad hosts. 

I am running a Spark Streaming job to consume data from Kinesis. As a result of 
this block fetch retrying and failing, I started seeing 
ProvisionedThroughputExceededException on shards, AmazonHttpClient (to Kinesis) 
SocketException, Kinesis ExpiredIteratorException etc. 

This issue also expose a potential memory leak. Starting from the time that the 
bad host became unreachable, the physical memory usages of executors that keep 
trying to fetch block from the bad host started increasing and finally hit the 
physical memory limit and killed by YARN. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to