Ethan Rose created HDDS-6236:
--------------------------------

             Summary: SCM receives reports of unknown containers 
                 Key: HDDS-6236
                 URL: https://issues.apache.org/jira/browse/HDDS-6236
             Project: Apache Ozone
          Issue Type: Bug
          Components: SCM
            Reporter: Ethan Rose


In two different Ozone clusters running SCM HA (may or may not be related to 
HA), we have noticed the following log messages in SCM leader and followers:

{code}

2022-01-19 12:53:24,021 ERROR 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Received container 
report for an unknown container 1368 from datanode \{ ... }

org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: ID #1368
        at 
org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.lambda$getContainer$0(ContainerManagerImpl.java:147)
        at java.base/java.util.Optional.orElseThrow(Optional.java:408)
        at 
org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.getContainer(ContainerManagerImpl.java:147)
        at 
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:94)
        at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:165)
        at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:133)
        at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:48)
        at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)

{code}

This seems only affect empty containers, since no data appears to be missing. 
Containers are supposed to exist in SCM DB even after they have been deleted 
from the datanode, so there is some kind of bug in the container persistence 
logic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to