[
https://issues.apache.org/jira/browse/HDDS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell resolved HDDS-5726.
-------------------------------------
Fix Version/s: 1.2.0
Resolution: Fixed
> Skip remove for already removed pipeline.
> -----------------------------------------
>
> Key: HDDS-5726
> URL: https://issues.apache.org/jira/browse/HDDS-5726
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Datanode
> Reporter: Mark Gui
> Assignee: Mark Gui
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.2.0
>
>
> Suspicious logs seen while executing decommission on the datanode side:
>
> {code:java}
> [ozoneadmin@3d6bc06ffe3d logs]$ grep -nr "Received SCM close pipeline
> request" *
> ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021548:2021-09-07 06:56:48,927
> [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
> Received SCM close pipeline request
> PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1
> ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021550:2021-09-07 06:56:48,927
> [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
> Received SCM close pipeline request
> PipelineID=98792470-c118-4462-8978-e4edf9b38ba3
> ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021757:2021-09-07 06:56:50,006
> [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
> Received SCM close pipeline request
> PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1
> ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021758:2021-09-07 06:56:50,007
> [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
> Received SCM close pipeline request
> PipelineID=98792470-c118-4462-8978-e4edf9b38ba3
> {code}
> There are duplicate pipeline close commands received on a datanode. So it
> results in a succeeded pipeline close and a failed one.
> I checked the log of scm and found that one is from the
> StartAdminOnNodeForStartDatanodeAdminHandler and one from
> PipelineReportForPipelineReportHandler.
> Because the pipeline is already closed by decommission and the pipeline
> report is sent before it happens, so there is no such pipeline on the scm
> side, then scm delivers a second close pipeline command to datanode.
>
>
> {code:java}
> logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70775:2021-09-07 06:56:36,938
> [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send
> pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to
> datanode 62687ea7-7043-4b7c-889a-6a27b1586df9
> logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70777:2021-09-07 06:56:36,938
> [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send
> pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to
> datanode 160bb7aa-dc9b-4817-9910-05fb20c7b2fc
> logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70779:2021-09-07 06:56:36,938
> [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send
> pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to
> datanode 9e5d6a30-9149-4d8b-9455-e48db254bfa5
> {code}
>
> {code:java}
> 2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler]
> INFO org.apache.hadoop.hdds.scm.pipeline.PipelineReportHandler: Reported
> pipeline PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 is not found
> 2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler]
> DEBUG org.apache.hadoop.hdds.server.events.EventQueue: Delivering
> [event=Datanode_Command] to executor/handler
> DatanodeCommandForSCMNodeManager: CommandForDatanode
> {code}
> So we should check for possible duplicate close commands on the datanode side.
>
>
>
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]