Ivan Andika created HDDS-9959:
---------------------------------
Summary: Propagate close pipelines to other datanodes in the
pipeline
Key: HDDS-9959
URL: https://issues.apache.org/jira/browse/HDDS-9959
Project: Apache Ozone
Issue Type: Improvement
Components: DN, Ozone Datanode
Reporter: Ivan Andika
Assignee: Ivan Andika
In https://issues.apache.org/jira/browse/RATIS-1947, it was found that there
might be cases where Datanodes in the same pipeline are closed hours apart.
# dn1
2023-11-29 15:22:59,477 [Command processor thread] INFO
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler:
Close Pipeline PipelineID=23e46782-6b48-4559-b3ac-0f95993cf0bc command on
datanode 1669a7e6-fe3c-4f7e-8fcb-ec5d5027b0eb.
# dn5
2023-11-29 14:07:55,442 [Command processor thread] INFO
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler:
Close Pipeline PipelineID=23e46782-6b48-4559-b3ac-0f95993cf0bc command on
datanode bd1e72ab-cfd5-4cc1-8fbf-6ec9d9654c98.
# dn8
2023-11-29 16:57:53,894 [Command processor thread] INFO
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler:
Close Pipeline PipelineID=23e46782-6b48-4559-b3ac-0f95993cf0bc command on
datanode 4a23d1e8-d526-4a4d-8ed1-13ffbab3a5cc.
This might happen when there are a lot of commands queues in some of the
Datanode's commandQueue, causing some command to be handled earlier than the
other.
Furthermore, Ratis group remove operation is only local to the Raft server and
not propagated to the other Raft peers in the same group.
Therefore, similar to CreatePipelineCommand, whenever a datanode receives a
pipeline close command, it also needs to propagate the group remove command to
the other datanodes (Raft peers) in the same pipeline.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]