hustfxj created SPARK-19831:
-------------------------------
Summary: Sending the heartbeat to master maybe blocked by other
rpc messages
Key: SPARK-19831
URL: https://issues.apache.org/jira/browse/SPARK-19831
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.2.0
Reporter: hustfxj
Cleaning the application may cost much time at worker, then it will block that
the worker send heartbeats master and rpc messages because the worker is extend
*ThreadSafeRpcEndpoint*. So the master will think the worker is dead. If the
worker has a driver, the driver will be scheduled by master again. So I think
it is the bug on spark. I can solve this problem by followed suggests:
1. It had better put the cleaning the application in a single asynchronous
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages
like SendHeartbeat;
2. It had better not send the heartbeat master by rpc channel. Because any
other rpc message may block the rpc channel. It had better send the heartbeat
master at an asynchronous timing thread .
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]