priyeshkaratha commented on code in PR #8907:
URL: https://github.com/apache/ozone/pull/8907#discussion_r2336062367


##########
hadoop-hdds/docs/content/design/storage-distribution.md:
##########
@@ -0,0 +1,148 @@
+---
+title: Storage Capacity Distribution Dashboard
+summary: Proposal for introducing a comprehensive storage distribution 
dashboard in Recon for enhanced cluster monitoring and debugging capabilities.
+date: 2025-08-05
+jira: HDDS-13177
+status: Under Review
+author: Priyesh Karatha
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
+  either express or implied. See the License for the specific
+  language governing permissions and limitations under the License.
+-->
+
+# Abstract
+
+Ozone currently lacks a unified interface to monitor and analyze storage 
distribution across its cluster components. This makes it difficult to:
+
+- Understand data distribution across the cluster
+- Debug storage reclamation issues
+- Monitor pending deletion progress
+- Analyze utilization patterns
+- Identify potential bottlenecks and imbalances
+
+This proposal introduces a comprehensive **Storage Capacity Distribution 
Dashboard** in Recon to address these challenges.
+
+---
+
+# Key Features
+
+## 1. Storage Distribution Analysis
+
+Detailed breakdown of storage usage across components:
+
+- **Global Used Space**
+- **Global Namespace Space**
+- **Open Keys and Open Files**: Data held in open keys and files
+- **Committed Keys**: Space used by committed key-value pairs
+- **Component-wise Distribution**: Metrics segregated by OM, SCM, and DataNodes
+
+## 2. Deletion Progress Monitoring
+
+Track pending deletions at various stages:
+
+- **OM Pending Deletions**: Keys marked for deletion at OM
+- **SCM Pending Deletions**: Container-level deletions managed by SCM
+- **DataNode Pending Deletions**: Block-level deletion metrics on each DataNode
+
+## 3. Cluster Overview Metrics
+
+Summarized cluster statistics:
+
+- Total capacity and used space
+- Free space distribution across components
+
+---
+
+# Implementation Approaches
+
+## Approach 1: Recon-based Implementation
+
+Leverage the existing Recon service to build the dashboard with centralized 
and efficient data collection.
+
+### Benefits
+
+- **Unified Data Source**: All metrics aggregated centrally in Recon
+- **Performance Optimization**: Incremental sync reduces load
+- **Reduced Overhead**: Avoids redundant calculations across services
+- **Code Reusability**: Built on top of existing Recon infrastructure and 
endpoints

Review Comment:
   Here we don't want to show historical data. The requirement is to show the 
snapshot of storage system at particular time. So that customers will get idea 
how storage is distributed. Its decided to use Recon because customers 
currently rely on Recon for storage distribution and they are not having 
complete view over there. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to