[ 
https://issues.apache.org/jira/browse/HDDS-10758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Rose updated HDDS-10758:
------------------------------
    Description: 
If there is an under-replicated EC container, but no nodes available to service 
reconstruction, the leader SCM will log the following for each such container 
(1030 in this case) on each replication manager run:

{code}
2024-04-09 00:33:35,408 WARN 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler: 
Exception while processing for creating the EC reconstruction container 
commands for #1030.
org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to 
allocate container.
        at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
        at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
        at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
        at 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
        at 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
        at 
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
        at 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
        at 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
        at 
org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
        at java.base/java.lang.Thread.run(Thread.java:834)
2024-04-09 00:33:35,408 ERROR 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor: 
Error processing under replicated container ContainerInfo{id=#1030, 
state=CLOSED, pipelineID=PipelineID=acb9c258-1dfe-46a3-b317-e2231b6acffb, 
stateEnterTime=2024-04-08T23:55:49.554Z, owner=om2}
org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to 
allocate container.
        at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
        at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
        at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
        at 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
        at 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
        at 
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
        at 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
        at 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
        at 
org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
        at java.base/java.lang.Thread.run(Thread.java:834)
{code}

This is two stack traces for one error, and can quickly roll off the leader's 
logs. We should remove the stack traces and reduce this to one log message per 
container per replication manager run.

  was:
If there is an under-replicated EC container, but no nodes available to service 
reconstruction, the leader SCM will log the following for each such container 
(1030 in this case) on each replication manager run:

{code}
2024-04-09 00:33:35,408 WARN 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler: 
Exception while processing for creating the EC reconstruction container 
commands for #1030.
org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to 
allocate container.
        at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
        at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
        at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
        at 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
        at 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
        at 
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
        at 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
        at 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
        at 
org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
        at java.base/java.lang.Thread.run(Thread.java:834)
2024-04-09 00:33:35,408 ERROR 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor: 
Error processing under replicated container ContainerInfo{id=#1030, 
state=CLOSED, pipelineID=PipelineID=acb9c258-1dfe-46a3-b317-e2231b6acffb, 
stateEnterTime=2024-04-08T23:55:49.554Z, owner=om2}
org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to 
allocate container.
        at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
        at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
        at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
        at 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
        at 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
        at 
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
        at 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
        at 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
        at 
org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
        at java.base/java.lang.Thread.run(Thread.java:834)
{code}

This is two stack traces for one error, and can quickly roll off the leader's 
logs. We should remove the stack traces and reduce this to one log message per 
container.


> Reduce verbosity of SCM replication manager logs when no nodes are available
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-10758
>                 URL: https://issues.apache.org/jira/browse/HDDS-10758
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: SCM
>    Affects Versions: 1.4.0
>            Reporter: Ethan Rose
>            Priority: Major
>
> If there is an under-replicated EC container, but no nodes available to 
> service reconstruction, the leader SCM will log the following for each such 
> container (1030 in this case) on each replication manager run:
> {code}
> 2024-04-09 00:33:35,408 WARN 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler: 
> Exception while processing for creating the EC reconstruction container 
> commands for #1030.
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to 
> allocate container.
>       at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
>       at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
>       at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
>       at 
> org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> 2024-04-09 00:33:35,408 ERROR 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor: 
> Error processing under replicated container ContainerInfo{id=#1030, 
> state=CLOSED, pipelineID=PipelineID=acb9c258-1dfe-46a3-b317-e2231b6acffb, 
> stateEnterTime=2024-04-08T23:55:49.554Z, owner=om2}
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to 
> allocate container.
>       at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
>       at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
>       at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
>       at 
> org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
>       at 
> org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> This is two stack traces for one error, and can quickly roll off the leader's 
> logs. We should remove the stack traces and reduce this to one log message 
> per container per replication manager run.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to