errose28 commented on code in PR #8907:
URL: https://github.com/apache/ozone/pull/8907#discussion_r2437252623
##########
hadoop-hdds/docs/content/design/storage-distribution.md:
##########
@@ -132,6 +132,19 @@ Additionally, Recon already possesses a comprehensive
physical and logical capac
A new SCM upgrade action (ScmOnFinalizeActionForDataDistribution) is
introduced.
This action is part of the finalization process for the DATA_DISTRIBUTION
layout feature, which enables the new block size tracking capabilities.
+A new feature layout is added to check compatibility between components.
Following is an example for handling compatability between OM and SCM
+
+| OM Version | SCM Version | Compatibility Handling
|
+|------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| **Old OM** | **New SCM** | SCM receives old proto. In
`ServerSideTranslatorPB`, it checks if the new block list is empty; if so, it
decodes using the old structure. |
+| **New OM** | **Old SCM** | OM will be using `getScmInfo()` to fetch SCM
metadata layout version. If the feature is not finalized, OM sends the old
structure. |
+| **New OM** | **New SCM** | Fully upgraded. OM sends size-aware transactions
and SCM processes them accordingly.
|
+| **Old OM** | **Old SCM** | Legacy setup. Both components use old proto
structure.
|
Review Comment:
We currently don't support rolling upgrades so OM/SCM cross compatibility
does not need to be handled like this. `ComponentVersion`s are currently used
for client/server interactions and `LayoutFature`s are used to keep disk state
readable to an older version when downgrading.
##########
hadoop-hdds/docs/content/design/storage-distribution.md:
##########
@@ -132,6 +132,19 @@ Additionally, Recon already possesses a comprehensive
physical and logical capac
A new SCM upgrade action (ScmOnFinalizeActionForDataDistribution) is
introduced.
This action is part of the finalization process for the DATA_DISTRIBUTION
layout feature, which enables the new block size tracking capabilities.
Review Comment:
Upgrade actions usually do reformatting. If this is just a "feature flag"
type use case the layout feature itself is enough. The code can check if the
layout feature is finalized before executing.
##########
hadoop-hdds/docs/content/design/storage-distribution.md:
##########
@@ -132,6 +132,19 @@ Additionally, Recon already possesses a comprehensive
physical and logical capac
A new SCM upgrade action (ScmOnFinalizeActionForDataDistribution) is
introduced.
This action is part of the finalization process for the DATA_DISTRIBUTION
layout feature, which enables the new block size tracking capabilities.
+A new feature layout is added to check compatibility between components.
Following is an example for handling compatability between OM and SCM
+
+| OM Version | SCM Version | Compatibility Handling
|
+|------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| **Old OM** | **New SCM** | SCM receives old proto. In
`ServerSideTranslatorPB`, it checks if the new block list is empty; if so, it
decodes using the old structure. |
+| **New OM** | **Old SCM** | OM will be using `getScmInfo()` to fetch SCM
metadata layout version. If the feature is not finalized, OM sends the old
structure. |
+| **New OM** | **New SCM** | Fully upgraded. OM sends size-aware transactions
and SCM processes them accordingly.
|
+| **Old OM** | **Old SCM** | Legacy setup. Both components use old proto
structure.
|
+
+Also in SCM, while upgrading for an existing Ozone cluster, all existing block
deletion transactions prior to DATA_DISTRIBUTION finalized will be ignored to
update DeletedBlocksTransactionSummary when it's removed from SCM DB.
+DeletedBlocksTransactionSummary only counts the transaction after
DATA_DISTRIBUTION is finalized.
+Also in DN side, newly added metadata will be persisted only if the feature is
finalized.
Review Comment:
Again, why does this metadata affect downgrade such that a layout feature
is required? Usually old versions do not care about additional persisted fields
which they do not read.
##########
hadoop-hdds/docs/content/design/storage-distribution.md:
##########
@@ -132,6 +132,19 @@ Additionally, Recon already possesses a comprehensive
physical and logical capac
A new SCM upgrade action (ScmOnFinalizeActionForDataDistribution) is
introduced.
This action is part of the finalization process for the DATA_DISTRIBUTION
layout feature, which enables the new block size tracking capabilities.
+A new feature layout is added to check compatibility between components.
Following is an example for handling compatability between OM and SCM
Review Comment:
Layout features are contained within a component to handle compatibility
between the software and persisted disk state. `ComponentVersion`s are used to
track cross-component compatibility over the network.
##########
hadoop-hdds/docs/content/design/storage-distribution.md:
##########
@@ -0,0 +1,201 @@
+---
+title: Storage Capacity Distribution Dashboard
+summary: Proposal for introducing a comprehensive storage distribution
dashboard in Recon.
+date: 2025-08-05
+jira: HDDS-13177
+status: Under Review
+author: Priyesh Karatha
+---
+
+<!--
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
+ either express or implied. See the License for the specific
+ language governing permissions and limitations under the License.
+-->
+
+# Abstract
+
+Ozone currently lacks a unified interface to monitor and analyze storage
distribution across its cluster components. This makes it difficult to:
+
+- Understand data distribution across the cluster
+- Debug storage reclamation issues
+- Monitor pending deletion progress
+- Analyze utilization patterns
+- Identify potential bottlenecks and imbalances
+
+This proposal introduces a comprehensive **Storage Capacity Distribution
Dashboard** in Recon to address these challenges.
+
+---
+
+# Key Features
+
+## 1. Storage Distribution Analysis
+
+Detailed breakdown of storage usage across components:
+
+- **Global Used Space**: Represents the actual physical storage consumed on
the DataNodes.
+- **Global Namespace Space**: Logical size of the namespace, calculated as the
sum of pendingDirectorySize + pendingKeySize + totalOpenKeySize +
totalCommittedSize, multiplied by the replication factor.
+- **Open Keys and Open Files**: Space occupied by data in open keys and files
that are not yet finalized.
+- **Committed Keys**: Space used by fully committed key–value pairs.
+- **Component-wise Distribution**: Breakdown of metrics across OM, SCM, and
DataNodes.
+
+## 2. Deletion Progress Monitoring
+
+Track pending deletions at various stages:
+
+- **OM Pending Deletions**: Keys marked for deletion at OM
+- **SCM Pending Deletions**: Container-level deletions managed by SCM
+- **DataNode Pending Deletions**: Block-level deletion metrics on each DataNode
+
+## 3. Cluster Overview Metrics
+
+Summarized cluster statistics:
+
+- Total capacity and used space
+- Free space distribution across components
+
+---
+
+# Implementation Approaches
+
+## Approach 1: Recon-based Implementation
+
+Leverage the existing Recon service to build the dashboard with centralized
and efficient data collection.
+Recon currently maintains synchronization with the OM database and constructs
the NSSummary tree, providing established calculation logic for metrics such as
openKeysBytes and committedBytes.
+Additionally, Recon already possesses a comprehensive physical and logical
capacity breakdown through its OM DB insights component.
+
+### Benefits
+
+- **Unified Data Source**: All metrics aggregated centrally in Recon
+- **Performance Optimization**: Incremental sync reduces the load
+- **Reduced Overhead**: Avoids redundant calculations across services
+- **Code Reusability**: Built on top of existing Recon infrastructure and
endpoints
+
+
+
+### Component-wise Enhancements
+
+#### **DataNodes (DN)**
+
+- **Current State**: DNs expose storage metrics in their reports
+- **Enhancement**:
+ - Add `pending deletion byte counters` in container metadata
+ - Calculate total pending per DN from container metadata and publish metrics
+- **Responsibilities**:
+ - Report actual and pending deletion usage per container
+
+#### **Storage Container Manager (SCM)**
+
+- **Current Gap**: No block size tracking in the block deletion process
+- **Enhancement**:
+ - Track block sizes when OM issues a deletion request
+ - Send a deletion command to DN along with block size and replication factor
+
+ ```
+ OM → SCM: block deletion request + block size
+ SCM → DN: delete command + block size + replication factor
+ ```
+
+- **Responsibilities**:
+ - Serve as the metadata bridge between logical keys and physical blocks
+
+#### **Ozone Manager (OM)**
+
+- **Enhancement**:
+ - Compute block sizes during deletion
+- **Responsibilities**:
+ - Expose logical storage metrics — committed keys, open keys, and namespace
usage.(All calculations will be performed in Recon using the synchronized OM
database.)
+
+
+#### **Recon**
+
+- **Enhancement**:
+ - Add a new dashboard aggregating:
+ - Logical metrics from OM
+ - Deletion progress from SCM
+ - Container-level metadata from DNs
+- **Data Sources**:
+ - OM DB (via Insight Sync)
+ - SCM Client API
+ - DN BlockDeletingService metrics (This is done via scrapping jmx metrics
from DN in Recon)
+
+#### **Upgrade Path for Data Distribution Feature**
+
+A new SCM upgrade action (ScmOnFinalizeActionForDataDistribution) is
introduced.
+This action is part of the finalization process for the DATA_DISTRIBUTION
layout feature, which enables the new block size tracking capabilities.
+
+A new feature layout is added to check compatibility between components.
Following is an example for handling compatability between OM and SCM
+
+| OM Version | SCM Version | Compatibility Handling
|
+|------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| **Old OM** | **New SCM** | SCM receives old proto. In
`ServerSideTranslatorPB`, it checks if the new block list is empty; if so, it
decodes using the old structure. |
+| **New OM** | **Old SCM** | OM will be using `getScmInfo()` to fetch SCM
metadata layout version. If the feature is not finalized, OM sends the old
structure. |
+| **New OM** | **New SCM** | Fully upgraded. OM sends size-aware transactions
and SCM processes them accordingly.
|
+| **Old OM** | **Old SCM** | Legacy setup. Both components use old proto
structure.
|
+
+Also in SCM, while upgrading for an existing Ozone cluster, all existing block
deletion transactions prior to DATA_DISTRIBUTION finalized will be ignored to
update DeletedBlocksTransactionSummary when it's removed from SCM DB.
+DeletedBlocksTransactionSummary only counts the transaction after
DATA_DISTRIBUTION is finalized.
+Also in DN side, newly added metadata will be persisted only if the feature is
finalized.
+
+---
+## Approach 2: CLI-based (Not Proceeding)
+
+A CLI-based approach was evaluated to compute detailed usage and pending
deletion breakdown by analyzing offline OM and SCM database checkpoints and
querying DataNodes.
+While it offers precise, up-to-date results and independence from Recon, it
introduces significant operational overhead.
+
+This approach requires generating and processing large metadata snapshots,
which can take hours in large-scale clusters.
+Given its complexity, dependency on manual execution, and high resource
consumption, we have chosen not to proceed with the CLI-based solution and
instead focus on enhancing Recon for better usability and integration.
+
+## Metrics Exposure and Time-Series Tracking
+
+While Recon provides a point-in-time view of pending deletions and storage
distribution, it is equally important to track these metrics over time to
understand trends and validate reclamation progress.
+To enable this, components should expose metrics that can be scraped by
Prometheus and visualized in Grafana.
+
+### Component-wise Metrics
+
+- **Ozone Manager (OM)**
+ - Recon has already methods that are available to calculate the following
information.
+ - open key used space
+ - committed key used space
+ - containerPreAllocated space
+ - pending deletion
+ - In every OM db syncing, we can update these metrics values.
+
+- **Storage Container Manager (SCM)**
+ - The DeletedBlockLogStateManager is enhanced to aggregate these block sizes
in-memory, providing a DeletedBlocksTransactionSummary that includes total
transaction count, total block count, total block size, and total replicated
block size for pending deletions.
+ - This summary is rebuilt from persisted transaction data upon SCM startup
or leader election.
Review Comment:
Does this mean we have to iterate the entire block deletion table on startup
or leader change? That is probably not going to work given the large deletion
backlogs we have seen in live clusters.
##########
hadoop-hdds/docs/content/design/storage-distribution.md:
##########
@@ -132,6 +132,19 @@ Additionally, Recon already possesses a comprehensive
physical and logical capac
A new SCM upgrade action (ScmOnFinalizeActionForDataDistribution) is
introduced.
This action is part of the finalization process for the DATA_DISTRIBUTION
layout feature, which enables the new block size tracking capabilities.
+A new feature layout is added to check compatibility between components.
Following is an example for handling compatability between OM and SCM
+
+| OM Version | SCM Version | Compatibility Handling
|
+|------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| **Old OM** | **New SCM** | SCM receives old proto. In
`ServerSideTranslatorPB`, it checks if the new block list is empty; if so, it
decodes using the old structure. |
+| **New OM** | **Old SCM** | OM will be using `getScmInfo()` to fetch SCM
metadata layout version. If the feature is not finalized, OM sends the old
structure. |
+| **New OM** | **New SCM** | Fully upgraded. OM sends size-aware transactions
and SCM processes them accordingly.
|
+| **Old OM** | **Old SCM** | Legacy setup. Both components use old proto
structure.
|
+
+Also in SCM, while upgrading for an existing Ozone cluster, all existing block
deletion transactions prior to DATA_DISTRIBUTION finalized will be ignored to
update DeletedBlocksTransactionSummary when it's removed from SCM DB.
Review Comment:
I don't understand what this means. How do we know whether a block delete
showed up before or after the feature was finalized? And why does that affect
downgrades?
Also `DeletedBlocksTransactionSummary` isn't introduced until later in the
doc, so sections probably need to be moved around for continuity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]