[ 
https://issues.apache.org/jira/browse/HDDS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDDS-5726.
-------------------------------------
    Fix Version/s: 1.2.0
       Resolution: Fixed

> Skip remove for already removed pipeline.
> -----------------------------------------
>
>                 Key: HDDS-5726
>                 URL: https://issues.apache.org/jira/browse/HDDS-5726
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.2.0
>
>
> Suspicious logs seen while executing decommission on the datanode side:
>  
> {code:java}
> [ozoneadmin@3d6bc06ffe3d logs]$ grep -nr "Received SCM close pipeline 
> request" *
> ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021548:2021-09-07 06:56:48,927 
> [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
>  Received SCM close pipeline request 
> PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1
> ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021550:2021-09-07 06:56:48,927 
> [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
>  Received SCM close pipeline request 
> PipelineID=98792470-c118-4462-8978-e4edf9b38ba3
> ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021757:2021-09-07 06:56:50,006 
> [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
>  Received SCM close pipeline request 
> PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1
> ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021758:2021-09-07 06:56:50,007 
> [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
>  Received SCM close pipeline request 
> PipelineID=98792470-c118-4462-8978-e4edf9b38ba3
> {code}
> There are duplicate pipeline close commands received on a datanode. So it 
> results in a succeeded pipeline close and a failed one.
> I checked the log of scm and found that one is from the 
> StartAdminOnNodeForStartDatanodeAdminHandler and one from 
> PipelineReportForPipelineReportHandler.
> Because the pipeline is already closed by decommission and the pipeline 
> report is sent before it happens, so there is no such pipeline on the scm 
> side, then scm delivers a second close pipeline command to datanode.
>  
>  
> {code:java}
> logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70775:2021-09-07 06:56:36,938 
> [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send 
> pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to 
> datanode 62687ea7-7043-4b7c-889a-6a27b1586df9
> logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70777:2021-09-07 06:56:36,938 
> [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send 
> pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to 
> datanode 160bb7aa-dc9b-4817-9910-05fb20c7b2fc
> logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70779:2021-09-07 06:56:36,938 
> [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send 
> pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to 
> datanode 9e5d6a30-9149-4d8b-9455-e48db254bfa5
> {code}
>  
> {code:java}
> 2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler] 
> INFO org.apache.hadoop.hdds.scm.pipeline.PipelineReportHandler: Reported 
> pipeline PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 is not found
> 2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler] 
> DEBUG org.apache.hadoop.hdds.server.events.EventQueue: Delivering 
> [event=Datanode_Command] to executor/handler 
> DatanodeCommandForSCMNodeManager: CommandForDatanode
> {code}
> So we should check for possible duplicate close commands on the datanode side.
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to