Mark Gui created HDDS-5726:
------------------------------
Summary: Skip remove for already removed pipeline.
Key: HDDS-5726
URL: https://issues.apache.org/jira/browse/HDDS-5726
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Datanode
Reporter: Mark Gui
Assignee: Mark Gui
Suspicious logs seen while executing decommission on the datanode side:
{code:java}
[ozoneadmin@3d6bc06ffe3d logs]$ grep -nr "Received SCM close pipeline request" *
ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021548:2021-09-07 06:56:48,927
[EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
Received SCM close pipeline request
PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1
ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021550:2021-09-07 06:56:48,927
[EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
Received SCM close pipeline request
PipelineID=98792470-c118-4462-8978-e4edf9b38ba3
ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021757:2021-09-07 06:56:50,006
[EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
Received SCM close pipeline request
PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1
ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021758:2021-09-07 06:56:50,007
[EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
Received SCM close pipeline request
PipelineID=98792470-c118-4462-8978-e4edf9b38ba3
{code}
There are duplicate pipeline close commands received on a datanode. So it
results in a succeeded pipeline close and a failed one.
I checked the log of scm and found that one is from the
StartAdminOnNodeForStartDatanodeAdminHandler and one from
PipelineReportForPipelineReportHandler.
Because the pipeline is already closed by decommission and the pipeline report
is sent before it happens, so there is no such pipeline on the scm side, then
scm delivers a second close pipeline command to datanode.
{code:java}
logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70775:2021-09-07 06:56:36,938
[EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send
pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to
datanode 62687ea7-7043-4b7c-889a-6a27b1586df9
logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70777:2021-09-07 06:56:36,938
[EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send
pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to
datanode 160bb7aa-dc9b-4817-9910-05fb20c7b2fc
logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70779:2021-09-07 06:56:36,938
[EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send
pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to
datanode 9e5d6a30-9149-4d8b-9455-e48db254bfa5
{code}
{code:java}
2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler]
INFO org.apache.hadoop.hdds.scm.pipeline.PipelineReportHandler: Reported
pipeline PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 is not found
2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler]
DEBUG org.apache.hadoop.hdds.server.events.EventQueue: Delivering
[event=Datanode_Command] to executor/handler DatanodeCommandForSCMNodeManager:
CommandForDatanode
{code}
So we should check for possible duplicate close commands on the datanode side.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]