[
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209668#comment-13209668
]
Aaron T. Myers commented on HDFS-2914:
--------------------------------------
bq. I was thinking about it a bit, it might get tricky to check for resources
when starting active services, because at this point the namenode is still in
standby. If it enters safe mode, then if there is any failure in transition we
should take care to transition it back to non-safe mode. I am also suspicious
that if it transitions to safemode, some active services may not start just
because of the safemode, and that would mean loss of service. We cannot throw
an exception either, if resources are low, for the same reason.
Hmmmmm, I don't _think_ this should be a problem. We currently support
transitioning to the active state while the NN is in safemode, so I don't see
why any services would fail to start if we were to enter safemode while
transitioning to the active state.
Regardless, even if it is possible, I think you've convinced me that it's not
actually necessary.
bq. I am leaning towards separating the two failure (low resources is not a
failure though) scenarios, i.e. standby transitions to active irrespective of
what its resource status is, and the check for resources is done independently
once transition to active is successfully completed. This is consistent with
the fact that low resources is not a failure, the cluster is still available in
read only mode.
OK, that seems fine. Perhaps we could also have FSNS#startActiveServices
interrupt the NameNodeResourceMonitor thread? That would guarantee that a
resource check would happen promptly after transitioning to active.
Offline Todd pointed out to me that another thing we could do would be to check
for having available resources in the monitorHealth RPC call, which the
failover controller can call *before* initiating a failover, to make sure there
are available resources on the NN which we want to failover to. That should
probably be done in a separate JIRA, though.
> HA: Standby should not enter safemode when resources are low
> ------------------------------------------------------------
>
> Key: HDFS-2914
> URL: https://issues.apache.org/jira/browse/HDFS-2914
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Hari Mankude
> Assignee: Hari Mankude
> Attachments: HDFS-2914-HDFS-1623, HDFS-2914-HDFS-1623,
> HDFS-2914-HDFS-1623.patch, hdfs-2914
>
>
> When shared edits dir is bounced, standby NN is put into safemode by the
> NameNodeResourceMonitor(). However, there is no path for it to exit out of
> safe mode when shared edits dir reappears.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira