jojochuang commented on code in PR #286: URL: https://github.com/apache/ozone-site/pull/286#discussion_r2725956146
########## docs/05-administrator-guide/03-operations/04-disk-replacement/01-ozone-manager.md: ########## @@ -4,4 +4,101 @@ sidebar_label: Ozone Manager # Replacing Ozone Manager Disks -**TODO:** File a subtask under [HDDS-9859](https://issues.apache.org/jira/browse/HDDS-9859) and complete this page or section. +**Audience:** Cluster Administrators + +**Prerequisites:** Familiarity with Ozone cluster administration and Linux system administration. + +--- + +## 1. Overview + +When a disk containing the Ozone Manager (OM) metadata directory fails, proper recovery procedures are critical to maintain cluster availability and prevent data loss. This document provides comprehensive, step-by-step guidance for safely replacing failed disks on OM nodes, with distinct procedures for standalone and High-Availability (HA) configurations. Following these procedures correctly ensures minimal downtime and maintains the integrity of your Ozone cluster's metadata. + +- **Purpose** +This guide provides the steps required to safely replace a failed disk on an Ozone Manager (OM) node. + +- **Impact of OM Disk Failure** +The OM disk is critical as it stores the RocksDB database containing the entire object store namespace (volumes, buckets, keys) and block locations. A failure of this disk can lead to metadata loss if not handled correctly. + +- **Crucial Distinction: HA vs. Non-HA** +The recovery procedure depends entirely on whether your OM is a single, standalone instance or part of a High-Availability (HA) Ratis-based quorum. The HA procedure is significantly safer and results in no cluster downtime. Running a standalone OM is not recommended for production environments. + +--- + +## 2. Pre-flight Checks + +Before starting, the administrator should: + +1. **Identify the Failed Disk:** Use system tools (`dmesg`, `smartctl`, etc.) to confirm which disk has failed and its mount point. + +2. **Identify OM Directories:** Check your `ozone-site.xml` to confirm which Ozone directories are on the failed disk. The most important ones are: + - `ozone.om.db.dirs`: The primary OM metadata database (RocksDB). This directory stores the entire object store namespace. + - `ozone.om.ratis.storage.dir`: The Ratis storage directory (if configured on a separate disk). This directory stores Ratis metadata like logs. If not explicitly configured, it falls back to `ozone.metadata.dirs`. For production environments, it is recommended to configure this on a separate, fast disk (preferably SSD) for better performance. + +3. **Prepare the Replacement Disk:** Have a new, healthy disk physically installed, formatted, and mounted on the system at the same path as the failed disk. Ensure it has the correct ownership and permissions for the user that runs the OM process. The default permissions for OM metadata directories are **750** (configurable via `ozone.om.db.dirs.permissions`). + +--- + +## 3. Procedure for a Standalone (Non-HA) Ozone Manager + +This is a high-risk, manual disaster recovery process that will require cluster downtime. + +1. **STOP THE ENTIRE CLUSTER:** Shut down all clients, Datanodes, SCM, and the Ozone Manager to prevent any further state changes. + +2. **Attempt Data Recovery:** If the failed disk is still partially readable, make a best-effort attempt to copy the contents of the `ozone.om.db.dirs` directory to a safe, temporary location. + +3. **If Recovery Fails, Restore from Backup:** If the OM database files are unrecoverable, you must restore from your most recent backup. This document does not cover the backup process itself, but it is the only path to recovery in this scenario. + +4. **Replace and Configure Disk:** Physically replace the hardware and ensure the new, empty disk is mounted at the correct path defined in `ozone.om.db.dirs`. + +5. **Restore Metadata:** Copy the recovered data **(from Step 2)** or the restored backup data **(from Step 3)** to the `ozone.om.db.dirs` path on the new disk. + +6. **Restart and Verify:** + - Start the SCM and Ozone Manager services. + - Once the OM is running, start the Datanodes. + - Run `ozone sh volume list` and other basic commands to verify that the namespace is intact and the cluster is operational. + +--- + +## 4. Procedure for an HA (Ratis-based) Ozone Manager + +This procedure is much safer, leverages the built-in redundancy of the OM HA cluster, and does not require full cluster downtime. + +### Bootstrap Procedure + +1. **STOP THE FAILED OM INSTANCE:** On the node with the failed disk, stop only the Ozone Manager process. The other OMs will continue operating, and one of them will remain the leader, serving client requests. + +2. **Replace and Configure Disk:** Physically replace the hardware. Mount the new, empty disk at the path defined in `ozone.om.db.dirs` and ensure it has the correct ownership and permissions. If `ozone.om.ratis.storage.dir` was also on the failed disk, ensure it is properly configured on the new disk as well. + +3. **Verify Configuration:** Before proceeding, ensure that all existing OMs have their `ozone-site.xml` configuration files updated with the configuration details of the OM being recovered (nodeId, address, port, etc.). The bootstrap process will verify this by checking all OMs' on-disk configurations. If an existing OM does not have updated configs, it can crash when bootstrap is initiated. + +4. **RE-INITIALIZE THE OM:** + - This is the key step. Since the local database is gone, the OM needs to be "reborn" by getting a complete copy of the latest state from the current OM leader. + - Simply starting the OM process on the repaired node with an empty DB directory will trigger this process automatically. The OM process is designed Review Comment: i think this step requires running OM with --bootstrap parameter. E.g. ozone om --bootstrap ########## docs/05-administrator-guide/03-operations/04-disk-replacement/01-ozone-manager.md: ########## @@ -4,4 +4,101 @@ sidebar_label: Ozone Manager # Replacing Ozone Manager Disks -**TODO:** File a subtask under [HDDS-9859](https://issues.apache.org/jira/browse/HDDS-9859) and complete this page or section. +**Audience:** Cluster Administrators + +**Prerequisites:** Familiarity with Ozone cluster administration and Linux system administration. + +--- + +## 1. Overview + +When a disk containing the Ozone Manager (OM) metadata directory fails, proper recovery procedures are critical to maintain cluster availability and prevent data loss. This document provides comprehensive, step-by-step guidance for safely replacing failed disks on OM nodes, with distinct procedures for standalone and High-Availability (HA) configurations. Following these procedures correctly ensures minimal downtime and maintains the integrity of your Ozone cluster's metadata. + +- **Purpose** +This guide provides the steps required to safely replace a failed disk on an Ozone Manager (OM) node. + +- **Impact of OM Disk Failure** +The OM disk is critical as it stores the RocksDB database containing the entire object store namespace (volumes, buckets, keys) and block locations. A failure of this disk can lead to metadata loss if not handled correctly. + +- **Crucial Distinction: HA vs. Non-HA** +The recovery procedure depends entirely on whether your OM is a single, standalone instance or part of a High-Availability (HA) Ratis-based quorum. The HA procedure is significantly safer and results in no cluster downtime. Running a standalone OM is not recommended for production environments. + +--- + +## 2. Pre-flight Checks Review Comment: The following steps assume Ozone configuration files are intact. If the configuration files are corrupt, they need to restore from a backup (if applicable) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
