Stephen O'Donnell created HDDS-5330:
---------------------------------------

             Summary: Datanode commands are not always invalidated when the SCM 
leader switches 
                 Key: HDDS-5330
                 URL: https://issues.apache.org/jira/browse/HDDS-5330
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Stephen O'Donnell


A datanode should only process commands from the SCM which is the leader.

In StateContext.getNextCommand(), there is logic to update the current leader 
SCM term for each command seen on the DN. It picks the command, and the updates 
the term based on the term stored in the command:

{code}
        updateTermOfLeaderSCM(command);
        if (command.getTerm() == termOfLeaderSCM.get()) {
          return command;
}
{code}

There are a few problems here:

1) If there are commands in the queue with a newer term, then the term stored 
in the DN will not be updated until all the pending commands have been 
processed. Therefore the SCM switch can have happened and the stale commands 
continue to be processed.

2) While there is a single command queue, there are further sub-queues. For 
example DeleteContainerCommandHandler places the commands into a executor 
queue. Similar for ReplicateContainerCommandHandler. These queues could be 
quite large and hence stale commands could be processed.

For (1), I believe the term should be updated when the commands are enqueued, 
not dequeued. That would ensure new commands update the term and invalidate the 
old commands immediately.

For (2), we should check the DN term prior to executing the command and drop 
the command if it is no longer valid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to