Josh Rosen commented on SPARK-16830:

Do you have stacktraces from the failed block fetches? I'd like to see whether 
this may be fixed by a recent patch of mine which helps to avoid failures if 
all locations of non-shuffle blocks are lost / unavailable.

> Executors Keep Trying to Fetch Blocks from a Bad Host
> -----------------------------------------------------
>                 Key: SPARK-16830
>                 URL: https://issues.apache.org/jira/browse/SPARK-16830
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Streaming
>    Affects Versions: 1.6.2
>         Environment: EMR 4.7.2
>            Reporter: Renxia Wang
> When a host became unreachable, driver removes the executors and block 
> managers on that hosts because it doesn't receive heartbeats. However, 
> executors on other hosts still keep trying to fetch blocks from the bad 
> hosts. 
> I am running a Spark Streaming job to consume data from Kinesis. As a result 
> of this block fetch retrying and failing, I started seeing 
> ProvisionedThroughputExceededException on shards, AmazonHttpClient (to 
> Kinesis) SocketException, Kinesis ExpiredIteratorException etc. 
> This issue also expose a potential memory leak. Starting from the time that 
> the bad host became unreachable, the physical memory usages of executors that 
> keep trying to fetch block from the bad host started increasing and finally 
> hit the physical memory limit and killed by YARN. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to