jojochuang commented on code in PR #288: URL: https://github.com/apache/ozone-site/pull/288#discussion_r2730476440
########## docs/05-administrator-guide/03-operations/04-disk-replacement/04-recon.md: ########## @@ -4,4 +4,131 @@ sidebar_label: Recon # Replacing Recon Disks -**TODO:** File a subtask under [HDDS-9859](https://issues.apache.org/jira/browse/HDDS-9859) and complete this page or section. +**Audience:** Cluster Administrators + +**Prerequisites:** Familiarity with Ozone services and Linux system administration. + +--- + +## 1. Overview + +- **Purpose:** + This guide provides the straightforward procedure for replacing a failed disk on an Ozone Recon node. + +- **Role of Recon:** + Recon is an auxiliary service that provides insights, visualization, and management for an Ozone cluster. It maintains a copy of metadata from the Ozone Manager (OM) and Storage Container Manager (SCM) to build its own database for analysis. + +- **Impact of Recon Disk Failure:** + A failure of the Recon disk will cause the Recon service to stop functioning. However, because Recon is not in the critical path for data I/O, this failure has **no impact on the core operations of your Ozone cluster**. Client reads and writes will continue normally. All data on the Recon disk can be fully rebuilt from OM and SCM. + +:::note +Unlike critical services like OM or SCM, a Recon disk failure does **not impact core cluster operations**. Client reads and writes continue normally because Recon is not in the data I/O path. All data stored on the Recon disk can be fully rebuilt from the active OM and SCM services, making disk replacement a straightforward, low-risk procedure that can be performed without cluster downtime. +::: + +When a Recon disk fails, the service will stop functioning, but upon restart with empty directories, Recon automatically detects the missing databases and initiates a complete rebuild by downloading fresh snapshots from the OM leader and syncing with SCM. This automatic recovery process ensures that all Recon databases—including the OM snapshot database, SCM snapshot database (if enabled), and Recon's own aggregated analysis databases—are fully reconstructed without manual intervention. + +### Recon Database Directories + +Recon uses several database directories that may be affected by disk failure: + +- **`ozone.recon.db.dir`**: Stores Recon's primary RocksDB database, which contains aggregated data and analysis results (ContainerKey and ContainerKeyCount tables). This directory also typically contains the SQL database (default Derby) used for storing GlobalStats, FileCountBySize, ReconTaskStatus, ContainerHistory, and UnhealthyContainers tables. +- **`ozone.recon.om.db.dir`**: Stores the copy of the OM database snapshot that Recon uses as its source of truth for the namespace. +- **`ozone.recon.scm.db.dirs`**: Stores the copy of the SCM database snapshot (if SCM snapshot is enabled via `ozone.recon.scm.snapshot.enabled`). This contains information about Datanodes, pipelines, and containers. Review Comment: this property does exist, but is not mentioned in ozone-default.xml, and not mentioned here: https://ozone-site-v2.staged.apache.org/docs/core-concepts/architecture/recon ########## docs/05-administrator-guide/03-operations/04-disk-replacement/04-recon.md: ########## @@ -4,4 +4,131 @@ sidebar_label: Recon # Replacing Recon Disks -**TODO:** File a subtask under [HDDS-9859](https://issues.apache.org/jira/browse/HDDS-9859) and complete this page or section. +**Audience:** Cluster Administrators + +**Prerequisites:** Familiarity with Ozone services and Linux system administration. + +--- + +## 1. Overview + +- **Purpose:** + This guide provides the straightforward procedure for replacing a failed disk on an Ozone Recon node. + +- **Role of Recon:** + Recon is an auxiliary service that provides insights, visualization, and management for an Ozone cluster. It maintains a copy of metadata from the Ozone Manager (OM) and Storage Container Manager (SCM) to build its own database for analysis. + Review Comment: ```suggestion ``` consider removing this paragraph -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
