Murtadha Hubail created ASTERIXDB-2284:

             Summary: Ensure Node Failure on Heartbeat Misses
                 Key: ASTERIXDB-2284
             Project: Apache AsterixDB
          Issue Type: Improvement
            Reporter: Murtadha Hubail
            Assignee: Murtadha Hubail

Currently, there is a possibility that an NC exceeds the allowed period to send 
its heartbeat (i.e. due to garbage collection pause), and continue to stay up 
which will result in the cluster state being unusable forever. The proposal is 
to ensure the failed node has really failed by asking it to shutdown. By doing 
this, if the shutdown succeeds, the NC will be restarted and the cluster state 
will be active again when the NC joins.

This message was sent by Atlassian JIRA

Reply via email to