Mark Gui created HDDS-5726:
------------------------------

             Summary: Skip remove for already removed pipeline.
                 Key: HDDS-5726
                 URL: https://issues.apache.org/jira/browse/HDDS-5726
             Project: Apache Ozone
          Issue Type: Bug
          Components: Ozone Datanode
            Reporter: Mark Gui
            Assignee: Mark Gui


Suspicious logs seen while executing decommission on the datanode side:

 
{code:java}
[ozoneadmin@3d6bc06ffe3d logs]$ grep -nr "Received SCM close pipeline request" *
ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021548:2021-09-07 06:56:48,927 
[EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: 
Received SCM close pipeline request 
PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1
ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021550:2021-09-07 06:56:48,927 
[EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: 
Received SCM close pipeline request 
PipelineID=98792470-c118-4462-8978-e4edf9b38ba3
ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021757:2021-09-07 06:56:50,006 
[EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: 
Received SCM close pipeline request 
PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1
ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021758:2021-09-07 06:56:50,007 
[EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: 
Received SCM close pipeline request 
PipelineID=98792470-c118-4462-8978-e4edf9b38ba3
{code}
There are duplicate pipeline close commands received on a datanode. So it 
results in a succeeded pipeline close and a failed one.

I checked the log of scm and found that one is from the 
StartAdminOnNodeForStartDatanodeAdminHandler and one from 
PipelineReportForPipelineReportHandler.

Because the pipeline is already closed by decommission and the pipeline report 
is sent before it happens, so there is no such pipeline on the scm side, then 
scm delivers a second close pipeline command to datanode.

 

 
{code:java}
logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70775:2021-09-07 06:56:36,938 
[EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send 
pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to 
datanode 62687ea7-7043-4b7c-889a-6a27b1586df9
logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70777:2021-09-07 06:56:36,938 
[EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send 
pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to 
datanode 160bb7aa-dc9b-4817-9910-05fb20c7b2fc
logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70779:2021-09-07 06:56:36,938 
[EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send 
pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to 
datanode 9e5d6a30-9149-4d8b-9455-e48db254bfa5
{code}
 
{code:java}
2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler] 
INFO org.apache.hadoop.hdds.scm.pipeline.PipelineReportHandler: Reported 
pipeline PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 is not found
2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler] 
DEBUG org.apache.hadoop.hdds.server.events.EventQueue: Delivering 
[event=Datanode_Command] to executor/handler DatanodeCommandForSCMNodeManager: 
CommandForDatanode
{code}
So we should check for possible duplicate close commands on the datanode side.

 

 

 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to