[
https://issues.apache.org/jira/browse/HDDS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai resolved HDDS-7608.
------------------------------------
Resolution: Implemented
Resolving, since all sub-tasks are done. Please feel free to reopen and add
new sub-task if necessary.
> Ensure queued commands with old SCM term are not processed
> ----------------------------------------------------------
>
> Key: HDDS-7608
> URL: https://issues.apache.org/jira/browse/HDDS-7608
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Stephen O'Donnell
> Assignee: Attila Doroszlai
> Priority: Major
>
> With SCM HA, every command sent to a datanode includes the SCM "term". If a
> new SCM leader is elected due to a failover or restart, the term increases.
> In general, any commands queued on a datanode from an old term should not be
> processed by the datanode once it notices the term has change, most
> importantly commands like DeleteContainer, as the new leader may schedule a
> delete of a different replica and then both deletes complete.
> The DN receives a new term by inspecting the term in each command. If it
> dequeues a command to process it and finds it has a greater term, it updates
> the term to the new value. Then any subsequent commands will be dropped if
> they have the old term.
> There are a few problems here:
> 1) If the DN does not receive any more commands for some reason (unlikely
> perhaps), then it will not receive the new term and drop any queued commands.
> Perhaps the term should be included in all heartbeat responses rather than
> depending on the one in the commands?
> 2) The term is only updated when the first command with the new term reaches
> the head of the queue. This means all commands before it will still get
> processed as normal. Perhaps we should update the term when the command is
> added to the queue, or update based on a field in the heartbeat.
> 3) Replicate and delete replica commands (and perhaps others) are taken from
> the main queue and added to sub-queues where they may stay for some time. If
> they are in a sub-queue, the term is never checked again, and it should be.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]