[jira] [Comment Edited] (HDDS-15610) SCM: Pending deletion block size metrics go negative causing corrupted Recon capacity display

Priyesh K (Jira) Wed, 24 Jun 2026 01:36:07 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090808#comment-18090808
 ]


Priyesh K edited comment on HDDS-15610 at 6/24/26 8:35 AM:
-----------------------------------------------------------

This doesn't look issue with leader change. Since only size is become negative. 
If its a leader change issue and transaction and number blocks also should be 
negative. So closing this Jira since this case is not valid.
also db iteration would cause performance impact


was (Author: JIRAUSER308991):
This doesn't look issue with leader change. Since only size is become negative. 
If its a leader change issue and transaction and number blocks also should be 
negative. So closing this Jira since this case is not valid.

> SCM: Pending deletion block size metrics go negative causing corrupted Recon 
> capacity display
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDDS-15610
>                 URL: https://issues.apache.org/jira/browse/HDDS-15610
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM, SCM HA
>            Reporter: Priyesh K
>            Assignee: Priyesh K
>            Priority: Major
>              Labels: pull-request-available
>
> Problem: During long-running key deletion, {{ozone admin scm deletedBlocksTxn 
> summary}} reports negative {{totalBlockSize}} and 
> {{totalBlockReplicatedSize}} (e.g., {{-1996386304}} bytes). Recon reads this 
> metric for cluster capacity, resulting in corrupted UI values like {{{}-631 
> MB{}}}.
> Root Cause: A two-release deployment gap exists between when 
> {{STORAGE_SPACE_DISTRIBUTION}} size fields were added to the 
> {{DeletedBlocksTransaction}} proto (in {{{}constructNewTransaction{}}}) and 
> when the summary accounting code was added to 
> {{{}addTransactions{}}}/{{{}removeTransactions{}}}. A leader running the 
> older release wrote TXs with size fields into the deletedBlocks CF but never 
> wrote a summary to {{{}statefulConfigTable{}}}. When a leader running the 
> newer release took over, {{initDataDistributionData()}} found no persisted 
> summary and left all counters at 0. {{getTransactions()}} then populated 
> {{txSizeMap}} for those size-carrying TXs. As datanodes committed them, 
> {{descDeletedBlocksSummary()}} decremented from 0, driving 
> {{totalBlocksSize}} negative. These negative values were Raft-replicated to 
> all followers and reloaded on every restart, making the corruption 
> self-perpetuating (confirmed in logs: {{2026-06-16 02:44 — totalBlocksSize 
> -1996386304}} loaded at startup).
> Fix:
>  * Round decrements at 0 in {{descDeletedBlocksSummary()}} to prevent 
> negative values from being persisted.
>  * In {{{}initDataDistributionData(){}}}, trust the persisted summary only 
> when both size fields are {{> 0}} . Otherwise fall back to a one-time 
> deletedBlocks CF scan to recompute correct totals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDDS-15610) SCM: Pending deletion block size metrics go negative causing corrupted Recon capacity display

Reply via email to