Renxia Wang created SPARK-16830:
-----------------------------------
Summary: Executors Keep Trying to Fetch Blocks from a Bad Host
Key: SPARK-16830
URL: https://issues.apache.org/jira/browse/SPARK-16830
Project: Spark
Issue Type: Bug
Components: Spark Core, Streaming
Affects Versions: 1.6.2
Environment: EMR 4.7.2
Reporter: Renxia Wang
When a host became unreachable, driver removes the executors and block managers
on that hosts because it doesn't receive heartbeats. However, executors on
other hosts still keep trying to fetch blocks from the bad hosts.
I am running a Spark Streaming job to consume data from Kinesis. As a result of
this block fetch retrying and failing, I started seeing
ProvisionedThroughputExceededException on shards, AmazonHttpClient (to Kinesis)
SocketException, Kinesis ExpiredIteratorException etc.
This issue also expose a potential memory leak. Starting from the time that the
bad host became unreachable, the physical memory usages of executors that keep
trying to fetch block from the bad host started increasing and finally hit the
physical memory limit and killed by YARN.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]