errose28 commented on code in PR #9664: URL: https://github.com/apache/ozone/pull/9664#discussion_r2920779201
########## hadoop-hdds/docs/content/design/zdu-design.md: ########## @@ -0,0 +1,535 @@ +--- +jira: HDDS-3331 +authors: +- Stephen O'Donnell +- Ethan Rose +- Istvan Fajth +--- + +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Zero Downtime Upgrade (ZDU) + +## The Goal + +The goal of Zero Downtime Upgrade (ZDU) is to allow the software running an existing Ozone cluster to be upgraded while the cluster remains operational. There should be no gaps in service and the upgrade should be transparent to applications using the cluster. + +Ozone is already designed to be fault tolerant, so the rolling restart of SCM, OM and Datanodes is already possible without impacting users of the cluster. The challenge with ZDU is therefore related to wire and disk compatibility, as different components within the cluster can be running different software versions concurrently. This design will focus on how we solve the wire and disk compatibility issues. + +## Component Upgrade Order + +To simplify reasoning about components of different types running in different versions, we should reduce the number of possible version combinations allowed as much as possible. Clients are considered external to the Ozone cluster, therefore we cannot control their version. However, we already have a framework to handle client/server cross compatibility, so rolling upgrade only needs to focus on compatibility of internal components. For internal Ozone components, we can define and enforce an order that the components must be upgraded in. Consider the following Ozone service diagram: + + + +Here the arrows represent client to server interactions between components, with the arrow pointing from the client to the server. The red arrow is external clients interacting with Ozone. The shield means that the client needs to see a consistent API surface despite leader changes in mixed version clusters so that APIs do not seem to disappear and reappear based on the node serving the request. The orange lines represent client to server interactions for internal Ozone components. For components connected by this internal line, **we can control the order that they are upgraded such that the server is always newer and handles all compatibility issues**. This greatly reduces the matrix of possible versions we may see within Ozone and mostly eliminates the need for internal Ozone components to be aware of each other’s versions, as long as servers remain backwards compatible. This order is: + +1. Upgrade all SCMs to the new version +2. Upgrade Recon to the new version +3. Upgrade all Datanodes to the new version +4. Upgrade all OMs to the new version +5. Upgrade all S3 gateways to the new version Review Comment: This should not cause an issue, because the apparent versions the components will remain the same in the Ratis ring even as the software is updated. That means the components with newer software version will still write data in a way that the older components bootstrapping can understand (and vice versa). Check out the table around line 107 and the appendix to see how the apparent version moves in lock step for a Ratis ring. Finalization to move the apparent version forward can be done from a Ratis snapshot because the version is written to the DB as well as the version file. This is already handled in the current upgrade flow because finalization is an online operation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
