[ https://issues.apache.org/jira/browse/FLINK-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492729#comment-16492729 ]
ASF GitHub Bot commented on FLINK-9417: --------------------------------------- GitHub user sihuazhou opened a pull request: https://github.com/apache/flink/pull/6088 [FLINK-9417][ Distributed Coordination] Send heartbeat requests from RPC endpoint's main thread ## What is the purpose of the change This PR try to send heartbeat requests from RPC endpoint's main thread to avoid the faker alive information. ## Brief change log - *Send heartbeat requests from RPC endpoint's main thread* ## Verifying this change This change is a trivial rework . ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes) - The S3 file system connector: (no) ## Documentation - no You can merge this pull request into a Git repository by running: $ git pull https://github.com/sihuazhou/flink FLINK-9417 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6088.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6088 ---- commit 7fe857592d4932b5a77a9dbcb12fab53dd207be9 Author: sihuazhou <summerleafs@...> Date: 2018-05-28T14:06:21Z Send heartbeat requests from RPC endpoint's main thread. ---- > Send heartbeat requests from RPC endpoint's main thread > ------------------------------------------------------- > > Key: FLINK-9417 > URL: https://issues.apache.org/jira/browse/FLINK-9417 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination > Affects Versions: 1.5.0, 1.6.0 > Reporter: Till Rohrmann > Assignee: Sihua Zhou > Priority: Major > > Currently, we use the {{RpcService#scheduledExecutor}} to send heartbeat > requests to remote targets. This has the problem that we still see heartbeats > from this endpoint also if its main thread is currently blocked. Due to this, > the heartbeat response cannot be processed and the remote target times out. > On the remote side, this won't be noticed because it still receives the > heartbeat requests. > A solution to this problem would be to send the heartbeat requests to the > remote thread through the RPC endpoint's main thread. That way, also the > heartbeats would be blocked if the main thread is blocked/busy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)