[ 
https://issues.apache.org/jira/browse/HDDS-10758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840999#comment-17840999
 ] 

Ethan Rose commented on HDDS-10758:
-----------------------------------

[~siddhant] is this something you could take a look at?

> Reduce verbosity of SCM replication manager logs when no nodes are available
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-10758
>                 URL: https://issues.apache.org/jira/browse/HDDS-10758
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: SCM
>    Affects Versions: 1.4.0
>            Reporter: Ethan Rose
>            Priority: Major
>
> If there is an under-replicated EC container, but no nodes available to 
> service reconstruction, the leader SCM will log the following for each such 
> container (1030 in this case) on each replication manager run:
> {code}
> 2024-04-09 00:33:35,408 WARN 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler: 
> Exception while processing for creating the EC reconstruction container 
> commands for #1030.
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to 
> allocate container.
>       at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
>       at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
>       at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
>       at 
> org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> 2024-04-09 00:33:35,408 ERROR 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor: 
> Error processing under replicated container ContainerInfo{id=#1030, 
> state=CLOSED, pipelineID=PipelineID=acb9c258-1dfe-46a3-b317-e2231b6acffb, 
> stateEnterTime=2024-04-08T23:55:49.554Z, owner=om2}
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to 
> allocate container.
>       at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
>       at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
>       at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
>       at 
> org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> This is two stack traces for one error, and can quickly roll off the leader's 
> logs. We should remove the stack traces and reduce this to one log message 
> per container per replication manager run.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to