[
https://issues.apache.org/jira/browse/HDDS-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aravindan Vijayan updated HDDS-4227:
------------------------------------
Target Version/s: 1.2.0
> Implement a "prepareForUpgrade" step that applies all committed transactions
> onto the OM state machine.
> -------------------------------------------------------------------------------------------------------
>
> Key: HDDS-4227
> URL: https://issues.apache.org/jira/browse/HDDS-4227
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Components: Ozone Manager
> Reporter: Aravindan Vijayan
> Assignee: Aravindan Vijayan
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.1.0
>
>
> *Why is this needed?*
> Through HDDS-4143, we have a generic factory to handle multiple versions of
> apply transaction implementations based on layout version. Hence, this
> factory can be used to handle versioned requests across layout versions,
> whenever both the versions need to exist in the code (Let's say for
> HDDS-2939).
> However, it has been noticed that the OM ratis requests are still undergoing
> lot of minor changes (HDDS-4007, HDDS-4007, HDDS-3903), and in these cases it
> will become hard to maintain 2 versions of the code just to support clean
> upgrades.
> Hence, the plan is to build a pre-upgrade utility (client API) that makes
> sure that an OM instance has no "un-applied" transactions in this Raft log.
> Invoking this client API makes sure that the upgrade starts with a clean
> state. Of course, this would be needed only in a HA setup. In a non HA setup,
> this can either be skipped, or when invoked will be a No-Op (Non Ratis) or
> cause no harm (Single node Ratis).
> *How does it work?*
> Before updating the software bits, our goal is to get OMs to get to the
> latest state with respect to apply transaction. The reason we want this is to
> make sure that the same version of the code executes the AT step in all the 3
> OMs. In a high level, the flow will be as follows.
> * Before upgrade, *stop* the OMs.
> * Start OMs with a special flag --prepareUpgrade (This is something like
> --init, which is a special state which stops the ephemeral OM instance after
> doing some work)
> * When OM is started with the --prepareUpgrade flag, it does not start the
> RPC server, so no new requests can get in.
> * In this state, we give every OM time to apply txn until the last txn.
> * We know that at least 2 OMs would have gotten the last client request
> transaction committed into their log. Hence, those 2 OMs are expected to
> apply transaction to that index faster.
> * At every OM, the Raft log will be purged after this wait period (so that
> the replay does not happen), and a Ratis snapshot taken at last txn.
> * Even if there is a lagger OM which is unable to get to last applied txn
> index, its logs will be purged after the wait time expires.
> * Now when OMs are started with newer version, all the OMs will start using
> the new code.
> * The lagger OM will get the new Ratis snapshot since there are no logs to
> replay from.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]