[jira] [Updated] (HDDS-4580) Datanode can be stuck in leader not ready state after restart

ASF GitHub Bot (Jira) Fri, 11 Dec 2020 06:58:34 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HDDS-4580:
---------------------------------
    Labels: pull-request-available  (was: )

> Datanode can be stuck in leader not ready state after restart
> -------------------------------------------------------------
>
>                 Key: HDDS-4580
>                 URL: https://issues.apache.org/jira/browse/HDDS-4580
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Lokesh Jain
>            Assignee: Lokesh Jain
>            Priority: Major
>              Labels: pull-request-available
>
> On restart the transactions are reapplied for an existing ratis pipeline. 
> ContainerStateMachine#applyTransaction while processing future can throw 
> NullPointerException leading to the future being completed exceptionally. 
> {code:java}
>       future.thenApply(r -> {
>         if (trx.getServerRole() == RaftPeerRole.LEADER) {
>           long startTime = (long) trx.getStateMachineContext();
>           metrics.incPipelineLatency(cmdType,
>               Time.monotonicNowNanos() - startTime);
>         }
> {code}
> In the above code snippet trx.getStateMachineContext() will be null during 
> restart and this fails the future itself without updating the 
> applyTransactionCompletionMap. As a result the lastAppliedIndex is not 
> updated for the server and server is stuck in leader not ready state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-4580) Datanode can be stuck in leader not ready state after restart

Reply via email to