[
https://issues.apache.org/jira/browse/FLINK-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517790#comment-16517790
]
Sihua Zhou commented on FLINK-9417:
-----------------------------------
Hi [~till.rohrmann] One thing come to my mind, If we send heartbeat requests
from RPC's main thread, then should we also do a checking for the
HEARTBEAT_INTERVAL with a sanity min value(currently it only need to greater
than 0)? If the user configure a very small value e.g 10, then the resource
manager and the job master will be kept always very busy just for sending the
heartbeat.
> Send heartbeat requests from RPC endpoint's main thread
> -------------------------------------------------------
>
> Key: FLINK-9417
> URL: https://issues.apache.org/jira/browse/FLINK-9417
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination
> Affects Versions: 1.5.0, 1.6.0
> Reporter: Till Rohrmann
> Assignee: Sihua Zhou
> Priority: Major
>
> Currently, we use the {{RpcService#scheduledExecutor}} to send heartbeat
> requests to remote targets. This has the problem that we still see heartbeats
> from this endpoint also if its main thread is currently blocked. Due to this,
> the heartbeat response cannot be processed and the remote target times out.
> On the remote side, this won't be noticed because it still receives the
> heartbeat requests.
> A solution to this problem would be to send the heartbeat requests to the
> remote thread through the RPC endpoint's main thread. That way, also the
> heartbeats would be blocked if the main thread is blocked/busy.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)