[ 
https://issues.apache.org/jira/browse/HDDS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai resolved HDDS-7608.
------------------------------------
    Resolution: Implemented

Resolving, since all sub-tasks are done.  Please feel free to reopen and add 
new sub-task if necessary.

> Ensure queued commands with old SCM term are not processed
> ----------------------------------------------------------
>
>                 Key: HDDS-7608
>                 URL: https://issues.apache.org/jira/browse/HDDS-7608
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Stephen O'Donnell
>            Assignee: Attila Doroszlai
>            Priority: Major
>
> With SCM HA, every command sent to a datanode includes the SCM "term". If a 
> new SCM leader is elected due to a failover or restart, the term increases.
> In general, any commands queued on a datanode from an old term should not be 
> processed by the datanode once it notices the term has change, most 
> importantly commands like DeleteContainer, as the new leader may schedule a 
> delete of a different replica and then both deletes complete.
> The DN receives a new term by inspecting the term in each command. If it 
> dequeues a command to process it and finds it has a greater term, it updates 
> the term to the new value. Then any subsequent commands will be dropped if 
> they have the old term.
> There are a few problems here:
> 1) If the DN does not receive any more commands for some reason (unlikely 
> perhaps), then it will not receive the new term and drop any queued commands. 
> Perhaps the term should be included in all heartbeat responses rather than 
> depending on the one in the commands?
> 2) The term is only updated when the first command with the new term reaches 
> the head of the queue. This means all commands before it will still get 
> processed as normal. Perhaps we should update the term when the command is 
> added to the queue, or update based on a field in the heartbeat.
> 3) Replicate and delete replica commands (and perhaps others) are taken from 
> the main queue and added to sub-queues where they may stay for some time. If 
> they are in a sub-queue, the term is never checked again, and it should be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to