[ 
https://issues.apache.org/jira/browse/HDDS-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904396#comment-17904396
 ] 

Nandakumar commented on HDDS-11844:
-----------------------------------

I took a closer look at the existing pipeline safemode rule, we only check if 
there is at least one Datanode reported for a Pipeline.

This is done for a different reason, not to make sure that the write succeeds 
after safemode exit or we want to retain/maintain all the old Pipelines.

The actual reason for the Pipeline Safemode rule is to track {{OPEN/CLOSING}} 
Containers. The {{OPEN/CLOSING}} {{Containers}} are managed by 
{{{}Pipelines{}}}, the SCM doesn't really manage the replicas for 
{{OPEN/CLOSING}} Containers. It relies on Ratis {{Pipeline}} to manage the 
replicas. So to track/count the {{OPEN/CLOSING}} Containers, SCM needs to track 
the {{{}Pipelines{}}}. This is what is done in the Pipeline Safemode rule.

The Container Safemode rule checks if we have at least one replica reported for 
{{CLOSED/QUASI_CLOSED}} containers.
So, to make sure that we have at least one replica reported for all the 
containers {{{}(OPEN/CLOSING/QUASI_CLOSED/CLOSED){}}}, we need to have both the 
Pipeline and the Container Safemode rule.

+Why can't we add OPEN Containers to the Container Safemode rule?+

There can be containers in {{OPEN/CLOSING}} state in SCM which were never 
created by the client on the Datanodes. If we include Containers in 
{{OPEN/CLOSING}} state in Container Safemode rule, SCM might never come out of 
Safemode (waiting for OPEN container replica which never got created on 
Datanodes). This is the reason why we are not considering these {{Containers}} 
in Container Safemode Rule.

These containers are handled by tracking pipelines.

Reference:
[https://github.com/apache/ozone/blob/befd64e0689b54d0c17b6fb88732e9fcce788c3f/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java#L324]

> Do not wait for all the Pipelines to be reported to exit SafeMode
> -----------------------------------------------------------------
>
>                 Key: HDDS-11844
>                 URL: https://issues.apache.org/jira/browse/HDDS-11844
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Nandakumar
>            Priority: Major
>
> We don't have to wait for all the Pipelines to be reported to exit 
> {{SafeMode}}. Having at least one open {{Pipeline}} to serve writes is enough 
> to get out of {{SafeMode}}.
> We can reuse the {{Pipelines}} reported by {{Datanodes}}, but we don't have 
> to wait for all the {{Pipelines}} to be reported to get SCM out of 
> {{SafeMode}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to