[
https://issues.apache.org/jira/browse/HDDS-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904396#comment-17904396
]
Nandakumar commented on HDDS-11844:
-----------------------------------
I took a closer look at the existing pipeline safemode rule, we only check if
there is at least one Datanode reported for a Pipeline.
This is done for a different reason, not to make sure that the write succeeds
after safemode exit or we want to retain/maintain all the old Pipelines.
The actual reason for the Pipeline Safemode rule is to track {{OPEN/CLOSING}}
Containers. The {{OPEN/CLOSING}} {{Containers}} are managed by
{{{}Pipelines{}}}, the SCM doesn't really manage the replicas for
{{OPEN/CLOSING}} Containers. It relies on Ratis {{Pipeline}} to manage the
replicas. So to track/count the {{OPEN/CLOSING}} Containers, SCM needs to track
the {{{}Pipelines{}}}. This is what is done in the Pipeline Safemode rule.
The Container Safemode rule checks if we have at least one replica reported for
{{CLOSED/QUASI_CLOSED}} containers.
So, to make sure that we have at least one replica reported for all the
containers {{{}(OPEN/CLOSING/QUASI_CLOSED/CLOSED){}}}, we need to have both the
Pipeline and the Container Safemode rule.
+Why can't we add OPEN Containers to the Container Safemode rule?+
There can be containers in {{OPEN/CLOSING}} state in SCM which were never
created by the client on the Datanodes. If we include Containers in
{{OPEN/CLOSING}} state in Container Safemode rule, SCM might never come out of
Safemode (waiting for OPEN container replica which never got created on
Datanodes). This is the reason why we are not considering these {{Containers}}
in Container Safemode Rule.
These containers are handled by tracking pipelines.
Reference:
[https://github.com/apache/ozone/blob/befd64e0689b54d0c17b6fb88732e9fcce788c3f/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java#L324]
> Do not wait for all the Pipelines to be reported to exit SafeMode
> -----------------------------------------------------------------
>
> Key: HDDS-11844
> URL: https://issues.apache.org/jira/browse/HDDS-11844
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Nandakumar
> Priority: Major
>
> We don't have to wait for all the Pipelines to be reported to exit
> {{SafeMode}}. Having at least one open {{Pipeline}} to serve writes is enough
> to get out of {{SafeMode}}.
> We can reuse the {{Pipelines}} reported by {{Datanodes}}, but we don't have
> to wait for all the {{Pipelines}} to be reported to get SCM out of
> {{SafeMode}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]