Glen Geng created HDDS-4511:
-------------------------------
Summary: ReplicationManager#isContainerUnderReplicated should
consider OPEN container
Key: HDDS-4511
URL: https://issues.apache.org/jira/browse/HDDS-4511
Project: Hadoop Distributed Data Store
Issue Type: Improvement
Components: SCM
Affects Versions: 1.1.0
Reporter: Glen Geng
This improvement is inspired from the fixing of TestDeleteWithSlowFollower in
the broken HDDS-2823.
In the test case TestDeleteWithSlowFollower, there is following trace appearing
in the log
{code:java}
2020-11-24 19:32:13,551 [EventQueue-StaleNodeForStaleNodeHandler] INFO
node.StaleNodeHandler (StaleNodeHandler.java:onMessage(58)) - Datanode
132e6d1b-e472-449e-929e-5f42b87114c6{ip: 10.73.23.64, host: 10.73.23.64,
networkLocation: /default-rack, certSerialId: null} moved to stale state.
Finalizing its pipelines [PipelineID=6f0e173c-b5e2-4dc6-99e1-854aafdc8295,
PipelineID=c78bc2fb-dca1-4e09-ba71-dd824e2d4e73]2020-11-24 19:32:13,552
[EventQueue-StaleNodeForStaleNodeHandler] INFO pipeline.SCMPipelineManager
(PipelineManagerV2Impl.java:closePipeline(389)) - Pipeline Pipeline[ Id:
6f0e173c-b5e2-4dc6-99e1-854aafdc8295, Nodes:
132e6d1b-e472-449e-929e-5f42b87114c6{ip: 10.73.23.64, host: 10.73.23.64,
networkLocation: /default-rack, certSerialId:
null}46a77559-9d5c-4a1d-bad7-e7eb7b9c32da{ip: 10.73.23.64, host: 10.73.23.64,
networkLocation: /default-rack, certSerialId:
null}524fea63-ad85-4a3a-bcfb-ac40dfe3d5e7{ip: 10.73.23.64, host: 10.73.23.64,
networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE,
State:OPEN, leaderId:46a77559-9d5c-4a1d-bad7-e7eb7b9c32da,
CreationTimestamp2020-11-24T11:30:23.805Z] moved to CLOSED state
{code}
But by design of this case,
{code:java}
// Make the stale, dead and server failure timeout higher so that a dead
// node is not detecte at SCM as well as the pipeline close action
// never gets initiated early at Datanode in the test.{code}
It relies on ReplicationManager to close the OPEN container in SCM, so that SCM
won't hold the delete blocks command.
But the command disappears, since ReplicationManager#isContainerUnderReplicated
does not consider OPEN container, it only take care of CLOSED and QUASI_CLOSED
container.
After talked with [~Sammi], By design, we should avoid replicating container in
DELETING or DELETED state. ReplicationManager#isContainerUnderReplicated should
consider OPEN container
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]