Re: [PR] HDDS-14504. [Website v2] [Docs] [Administrator Guide] Replacing Recon Disks [ozone-site]

via GitHub Tue, 27 Jan 2026 00:52:18 -0800


jojochuang commented on code in PR #288:
URL: https://github.com/apache/ozone-site/pull/288#discussion_r2730922257



##########
docs/05-administrator-guide/03-operations/04-disk-replacement/04-recon.md:
##########
@@ -4,4 +4,131 @@ sidebar_label: Recon
 
 # Replacing Recon Disks
 
-**TODO:** File a subtask under 
[HDDS-9859](https://issues.apache.org/jira/browse/HDDS-9859) and complete this 
page or section.
+**Audience:** Cluster Administrators
+
+**Prerequisites:** Familiarity with Ozone services and Linux system 
administration.
+
+---
+
+## 1. Overview
+
+- **Purpose:**
+  This guide provides the straightforward procedure for replacing a failed 
disk on an Ozone Recon node.
+
+- **Role of Recon:**
+  Recon is an auxiliary service that provides insights, visualization, and 
management for an Ozone cluster. It maintains a copy of metadata from the Ozone 
Manager (OM) and Storage Container Manager (SCM) to build its own database for 
analysis.
+
+- **Impact of Recon Disk Failure:**
+  A failure of the Recon disk will cause the Recon service to stop 
functioning. However, because Recon is not in the critical path for data I/O, 
this failure has **no impact on the core operations of your Ozone cluster**. 
Client reads and writes will continue normally. All data on the Recon disk can 
be fully rebuilt from OM and SCM.
+
+:::note
+Unlike critical services like OM or SCM, a Recon disk failure does **not 
impact core cluster operations**. Client reads and writes continue normally 
because Recon is not in the data I/O path. All data stored on the Recon disk 
can be fully rebuilt from the active OM and SCM services, making disk 
replacement a straightforward, low-risk procedure that can be performed without 
cluster downtime.
+:::
+
+When a Recon disk fails, the service will stop functioning, but upon restart 
with empty directories, Recon automatically detects the missing databases and 
initiates a complete rebuild by downloading fresh snapshots from the OM leader 
and syncing with SCM. This automatic recovery process ensures that all Recon 
databases—including the OM snapshot database, SCM snapshot database (if 
enabled), and Recon's own aggregated analysis databases—are fully reconstructed 
without manual intervention.
+
+### Recon Database Directories
+
+Recon uses several database directories that may be affected by disk failure:
+
+- **`ozone.recon.db.dir`**: Stores Recon's primary RocksDB database, which 
contains aggregated data and analysis results (ContainerKey and 
ContainerKeyCount tables). This directory also typically contains the SQL 
database (default Derby) used for storing GlobalStats, FileCountBySize, 
ReconTaskStatus, ContainerHistory, and UnhealthyContainers tables.
+- **`ozone.recon.om.db.dir`**: Stores the copy of the OM database snapshot 
that Recon uses as its source of truth for the namespace.
+- **`ozone.recon.scm.db.dirs`**: Stores the copy of the SCM database snapshot 
(if SCM snapshot is enabled via `ozone.recon.scm.snapshot.enabled`). This 
contains information about Datanodes, pipelines, and containers.

Review Comment:
   That's okay let's just focus on this disk replacement for Recon page.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-14504. [Website v2] [Docs] [Administrator Guide] Replacing Recon Disks [ozone-site]

Reply via email to