Bharat Viswanadham created HDDS-5973:
----------------------------------------

             Summary: [SCM-HA] Sequence of steps during pipeline close need to 
be changed
                 Key: HDDS-5973
                 URL: https://issues.apache.org/jira/browse/HDDS-5973
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Bharat Viswanadham
            Assignee: Aswin Shakil Balasubramanian


Right now, when datanode becomes stale/dead
we close pipeline and then close containers.
This has caused issue that containers are in open state and pipeline is in 
close state. When adding a open container to closed pipeline SCM used to crash. 
As in SCM HA, flush to DB happens at snapshot frequency interval. In this case 
pipeline close is flushed to DB. And after this there are 2 ways it can happen
1. Close containers in ratis log.
2. SCM stopped, close container has not entered ratis log.

First case, once logs replayed Containers will be closed. (This is fixed as 
part of HDDS-5843, considering logs will be replayed and SCM state will be 
reached eventually to correct state)
In 2nd case container will be left open, as pipeline is in closed state. 
(Container might be forever in open state, and if under-replicated might not be 
replicated by RM, as container state is not in closed state)

Ordering should be changed as below during pipeline close
1. Close containers
2. Close pipelines



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to