Kohei Sugihara created HDDS-14845:
-------------------------------------

             Summary: SCMNodeStat cannot be initialized when negative scmUsed 
value has passed
                 Key: HDDS-14845
                 URL: https://issues.apache.org/jira/browse/HDDS-14845
             Project: Apache Ozone
          Issue Type: Bug
         Environment: Apache Ozone 2.1.0 with HA-enabled, long-running 
datanodes with V2 & V3 containers, some corrupted containers.
            Reporter: Kohei Sugihara


h2. Overview

After upgrading to 2.1.0, we experienced slow file upload issue. Digging down 
this issue, we finally confirmed SCMNodeStat object could not be initialized 
due to a negative value report from datanodes. I think this is a small case 
issue because we've seen this issue on the long running datanodes since Apache 
Ozone 1.2.0, so most of recently configured instances do not affect this issue, 
but we share our workarounds and investigation.
h2. Configuration
 * Apache Ozone 2.1.0 (or, with DUOptimized; it's default value since 2.1.0 by 
https://issues.apache.org/jira/browse/HDDS-12924)
 * Use the capacity-aware pipeline choosing policy
 * Datanode might have old containers (e.g. V2 schema) or corrupted containers

h2. Workaround 1:

We've seen the following stack trace about SCMNodeStat and this makes some 
retries during upload sequence.
{code:java}
2026-03-02 13:49:06,818 | ERROR | SCMAudit | user=om/HOST@REALM | ip=IP_ADDR | 
op=ALLOCATE_BLOCK 
{"replication":"RATIS/THREE","owner":"omservice","size":"268435456","num":"1","client":""}
 | ret=FAILURE | java.lang.IllegalArgumentException: used space cannot be 
negative.
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
        at 
org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat.<init>(SCMNodeStat.java:46)
 {code}
We configured our cluster with the capacity-aware pipeline choosing policy, so 
by switching to the random pipeline choosing policy resolved slow upload issue 
temporary.
{code:java}
    <property>
        <name>hdds.scm.pipeline.choose.policy.impl</name>
        
<value>org.apache.hadoop.hdds.scm.pipeline.choose.algorithms.CapacityPipelineChoosePolicy</value>
    </property> {code}
h2. Workaround 2:

We also examined the reason why a negative value was passed to SCMNodeStat. 
We've found the following logs on the datanode.
{code:java}
2026-03-02 18:03:35,455 
[c3e35eb9-a8be-463b-a22f-58b6e6e0767d-DataNodeDiskCheckerThread-60] WARN 
org.apache.hadoop.ozone.container.common.volume.HddsVolume: Volume 
/data/01/ozone
/hdds reserved usages is higher than actual allocated reserved space.
2026-03-02 18:03:36,176 [DiskUsage-/data/00/ozone-
] INFO org.apache.hadoop.hdds.fs.DUOptimized: Disk metaPath du usages 36864, 
container data usages -877788034370
2026-03-02 18:03:36,178 [DiskUsage-/data/07/ozone-
] INFO org.apache.hadoop.hdds.fs.DUOptimized: Disk metaPath du usages 36864, 
container data usages -1388241327952
2026-03-02 18:03:36,196 [DiskUsage-/data/05/ozone-
... {code}
We switched to the previous DU implementation to avoid this as our workaround. 
Our cluster has some containers in old schema versions and also including some 
corrupted containers, so it has possibly returned wrong value.
{code:java}
    <property>
        <name>hdds.datanode.du.factory.classname</name>
        <value>org.apache.hadoop.hdds.fs.DUFactory</value>
    </property> {code}
h2. Impact

In addition to the capacity-aware pipeline choosing policy, the negative value 
in SCMNodeStat will occurs the following errors:
h3. 1: {{ozone admin usageinfo}} command fails without kind error message

While debugging this, we also noticed {{ozone admin datanode}} command was 
stuck and fails. Raise their loglevel to DEBUG, we saw the following retries:
{code:java}
% ozone --loglevel DEBUG admin datanode usageinfo --json --hostname DN_HOST
...
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3198)
, while invoking $Proxy20.submitRequest over nodeId=scm-2,nodeAddress=HOST 
after 6 failover attempts. Trying to failover after sleeping for 2000ms. 
Current retry count: 6.
2026-03-02 15:31:28,709 [main] DEBUG 
proxy.SCMContainerLocationFailoverProxyProvider: Failing over to next proxy. 
scm-2
2026-03-02 15:31:28,710 [IPC Parameter Sending Thread for HOST] DEBUG 
ipc.Client: IPC Client (1324262781) connection to HOST from USER@REALMIPC 
Client (1324262781) connection to HOST from USER@REALM sending #0 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol.submitRequest
2026-03-02 15:31:28,712 [IPC Client (1324262781) connection to HOST from USER] 
DEBUG ipc.Client: IPC Client (1324262781) connection to HOST from USER@REALM 
got value #0
2026-03-02 15:31:28,713 [main] DEBUG 
proxy.SCMContainerLocationFailoverProxyProvider: RetryProxy: SCM Server scm-2: 
RemoteException: used space cannot be negative.
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
        at 
org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat.<init>(SCMNodeStat.java:46)
        at 
org.apache.hadoop.hdds.scm.node.SCMNodeManager.getNodeStatInternal(SCMNodeManager.java:1064)
        at 
org.apache.hadoop.hdds.scm.node.SCMNodeManager.getUsageInfo(SCMNodeManager.java:1021)
        at 
org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getUsageInfoFromDatanodeDetails(SCMClientProtocolServer.java:1359)
        at 
org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getDatanodeUsageInfo(SCMClientProtocolServer.java:1338)
        at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.getDatanodeUsageInfo(StorageContainerLocationProtocolServerSideTranslatorPB.java:1296)
        at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:673)
        at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
        at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:236)
        at 
org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1228)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1246)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1169)
        at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3198)
...{code}
When HA is enabled, the Ozone client asks to other SCM node and then receive 
we're not leader exception. We've seen the infinity error retries on the client 
side. The above logs are just excerpt from the long retries.
h3. 2: Container Balancer fails unexpectedly

Container Balancer uses SCMNodeStat object, so it fails. This is an error log 
excerpt from SCM log file.
{code:java}
2026-03-02 17:06:10,168 [scm-1-ContainerBalancerTask-2] ERROR 
org.apache.hadoop.hdds.scm.container.balancer.ContainerBalancerTask: Container 
Balancer is stopped abnormally,
java.lang.IllegalArgumentException: used space cannot be negative.
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
        at 
org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat.<init>(SCMNodeStat.java:46)
        at 
org.apache.hadoop.hdds.scm.node.SCMNodeManager.getNodeStatInternal(SCMNodeManager.java:1064)
        at 
org.apache.hadoop.hdds.scm.node.SCMNodeManager.getUsageInfo(SCMNodeManager.java:1021)
        at 
org.apache.hadoop.hdds.scm.node.SCMNodeManager.getMostOrLeastUsedDatanodes(SCMNodeManager.java:998)
        at 
org.apache.hadoop.hdds.scm.container.balancer.ContainerBalancerTask.initializeIteration(ContainerBalancerTask.java:462)
        at 
org.apache.hadoop.hdds.scm.container.balancer.ContainerBalancerTask.balance(ContainerBalancerTask.java:260)
        at 
org.apache.hadoop.hdds.scm.container.balancer.ContainerBalancerTask.run(ContainerBalancerTask.java:190)
        at java.base/java.lang.Thread.run(Thread.java:840) {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to