Ethan Rose created HDDS-10758:
---------------------------------
Summary: Reduce verbosity of SCM replication manager logs when no
nodes are available
Key: HDDS-10758
URL: https://issues.apache.org/jira/browse/HDDS-10758
Project: Apache Ozone
Issue Type: Improvement
Components: SCM
Affects Versions: 1.4.0
Reporter: Ethan Rose
If there is an under-replicated EC container, but no nodes available to service
reconstruction, the leader SCM will log the following for each such container
(1030 in this case) on each replication manager run:
{code}
2024-04-09 00:33:35,408 WARN
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler:
Exception while processing for creating the EC reconstruction container
commands for #1030.
org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to
allocate container.
at
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
at
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
at
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
at
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
at
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
at
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
at
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
at
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
at
org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
at java.base/java.lang.Thread.run(Thread.java:834)
2024-04-09 00:33:35,408 ERROR
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor:
Error processing under replicated container ContainerInfo{id=#1030,
state=CLOSED, pipelineID=PipelineID=acb9c258-1dfe-46a3-b317-e2231b6acffb,
stateEnterTime=2024-04-08T23:55:49.554Z, owner=om2}
org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to
allocate container.
at
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
at
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
at
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
at
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
at
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
at
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
at
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
at
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
at
org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
at java.base/java.lang.Thread.run(Thread.java:834)
{code}
This is two stack traces for one error, and can quickly roll off the leader's
logs. We should remove the stack traces and reduce this to one log message per
container.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]