priyeshkaratha commented on code in PR #365: URL: https://github.com/apache/ozone-site/pull/365#discussion_r2973459065
########## docs/05-administrator-guide/03-operations/09-observability/02-recon/03-recon-capacity-distribution.md: ########## @@ -0,0 +1,136 @@ +--- +sidebar_label: Cluster Capacity User Guide +--- +# Cluster Capacity User Guide + +This page is the central place for understanding storage distribution across the Ozone cluster. +It moves from a high-level physical view to logical service usage, and down to individual node diagnostics. +Use this guide to understand exactly where your storage capacity is going. + +## Dashboard Layout Overview + +The Cluster Capacity page is organized logically from top to bottom, increasing in granularity: + +1. Header & Controls: Global settings and refresh rates. +2. Cluster Summary: The total physical disk view. +3. Service Summary: The logical state of Ozone data (Open, Committed, Pending Deletion). +4. Pending Deletion & Datanode Insights: Deep dives into data deletion life cycles and individual node performance. + +--- + +## Cluster (Physical Capacity) + +The **Cluster** widget provides a high-level summary of the total physical storage managed by Ozone Datanodes. It helps you distinguish between space used by Ozone and space taken by other processes on the underlying hardware. + + + +### Metric Definitions + +- **Total Capacity (2.2 TB)** + The combined capacity of all configured storage directories across all live Datanodes in the cluster. + +- **Ozone Used Space (437.3 GB)** + Physical space currently occupied by replicated Ozone blocks. + > Note: This accounts for the replication factor (e.g., a 100 GB key with 3x replication uses 300 GB of physical space). + +- **Other Used Space (482.5 GB)** + Space on the disks that is occupied by non-Ozone files. This includes OS files, system logs, temp directories, or other Hadoop services running on the same hardware. + +- **Container Pre-allocated (0 B)** + Space reserved for open containers that have been allocated to clients but have not yet been written to. This ensures space is available when needed. + +- **Remaining Space (1.3 TB)** + The actual amount of unused physical disk space available for new Ozone data or other files. + +> 💡 **Administrator Tip:** +> Monitor **Other Used Space**. If this value is consistently high, it may indicate that non-Ozone processes are competing for disk space, which could lead to capacity issues for your Ozone data. + +--- + +## Service (Logical Capacity) + +The **Service** widget transitions from the physical view to the logical view. It breaks down the **Ozone Used Space** based on the state of the data keys within the Ozone architecture. + + + +### Ozone Used Space Breakdown + +- **Total (437.3 GB)** + The sum of all Ozone data currently tracked in the system across all states. This matches the physical Ozone Used Space. + +- **Open Keys (2.9 GB)** + Data in keys that are currently being written to by clients or have not yet been committed to the system. This data is temporary. + +- **Committed Keys (429.5 GB)** + Finalized and immutable data that is successfully stored and accessible by users. + +- **Pending Deletion (3.8 GB)** + Data from keys that have been logically deleted by a user but have not yet been physically scrubbed from the Datanodes. This is the combined total size of data pending deletion across OM, SCM, and Datanodes. This space will eventually be reclaimed. + +> 💡 **Administrator Tip:** +> A high and persistent **Pending Deletion** value might indicate that the automated deletion process is lagging. This guide explains how to investigate that lifecycle in the next section. + +--- + +## Pending Deletion Lifecycle + +This widget provides transparency into the multi-stage process of data deletion in Ozone. It tracks how deleted blocks move from the Ozone Manager through the Storage Container Manager to final removal on Datanodes. + + + +### Tracking the Stages + +- **Ozone Manager (OM) (0 B)** + Keys or directories deleted by the client but whose underlying blocks have not yet been processed by SCM. + +- **Storage Container Manager (SCM) (3.8 GB)** + Blocks that SCM has identified as ready for deletion and is actively trying to command Datanodes to remove. + +- **Datanodes (0 B)** + Blocks that are queued on the individual Datanodes waiting for physical disk deletion. + +> 💡 **Diagnostic Tip:** +> If SCM shows 1 TB pending deletion but the Datanodes stage shows 0 B, SCM may be having trouble communicating deletion commands to the nodes. + +--- + +## Datanode Insights + +The **Datanodes** section moves from the cluster level to individual node performance. This is crucial for identifying imbalances, failing disks, or nodes that are filling up faster than others. + + + +### Using the Datanode Inspector + +- **Download Insights** + Download a snapshot report of all Datanode storage distribution in CSV format. Review Comment: Since the field names are self-explanatory, adding separate fields to describe each section may be unnecessary. This information is already available in Swagger, and the field names are also explained within their respective sections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
