[
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
hustfxj updated SPARK-19831:
----------------------------
Summary: Sending the heartbeat master from worker maybe blocked by other
rpc messages (was: Sending the heartbeat to master maybe blocked by other rpc
messages)
> Sending the heartbeat master from worker maybe blocked by other rpc messages
> ------------------------------------------------------------------------------
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.2.0
> Reporter: hustfxj
>
> Cleaning the application may cost much time at worker, then it will block
> that the worker send heartbeats master and rpc messages because the worker
> is extend *ThreadSafeRpcEndpoint*. So the master will think the worker is
> dead. If the worker has a driver, the driver will be scheduled by master
> again. So I think it is the bug on spark. I can solve this problem by
> followed suggests:
> 1. It had better put the cleaning the application in a single asynchronous
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages
> like SendHeartbeat;
> 2. It had better not send the heartbeat master by rpc channel. Because any
> other rpc message may block the rpc channel. It had better send the heartbeat
> master at an asynchronous timing thread .
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]