[
https://issues.apache.org/jira/browse/ASTERIXDB-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363524#comment-16363524
]
ASF subversion and git services commented on ASTERIXDB-2284:
------------------------------------------------------------
Commit bf74a319dbdfa3fea3007d3286f14a77fecac178 in asterixdb's branch
refs/heads/master from [~mhubail]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=bf74a31 ]
[ASTERIXDB-2284][CLUS] Ensure Node Failure on Heartbeat Miss
- user model changes: no
- storage format changes: no
- interface changes: no
Details:
- Request the node which exceeded its heartbeat misses
to shutdown to ensure its failures.
- Ensure thread safety of lastHeartbeatNanoTime in
NodeControllerState.
Change-Id: I121f85fd858484377a9d888d18c3069c239f00fc
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2390
Sonar-Qube: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Contrib: Jenkins <[email protected]>
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Michael Blow <[email protected]>
> Ensure Node Failure on Heartbeat Misses
> ---------------------------------------
>
> Key: ASTERIXDB-2284
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2284
> Project: Apache AsterixDB
> Issue Type: Improvement
> Reporter: Murtadha Hubail
> Assignee: Murtadha Hubail
> Priority: Major
>
> Currently, there is a possibility that an NC exceeds the allowed period to
> send its heartbeat (i.e. due to garbage collection pause), and continue to
> stay up which will result in the cluster state being unusable forever. The
> proposal is to ensure the failed node has really failed by asking it to
> shutdown. By doing this, if the shutdown succeeds, the NC will be restarted
> and the cluster state will be active again when the NC joins.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)