[ 
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209668#comment-13209668
 ] 

Aaron T. Myers commented on HDFS-2914:
--------------------------------------

bq. I was thinking about it a bit, it might get tricky to check for resources 
when starting active services, because at this point the namenode is still in 
standby. If it enters safe mode, then if there is any failure in transition we 
should take care to transition it back to non-safe mode. I am also suspicious 
that if it transitions to safemode, some active services may not start just 
because of the safemode, and that would mean loss of service. We cannot throw 
an exception either, if resources are low, for the same reason.

Hmmmmm, I don't _think_ this should be a problem. We currently support 
transitioning to the active state while the NN is in safemode, so I don't see 
why any services would fail to start if we were to enter safemode while 
transitioning to the active state.

Regardless, even if it is possible, I think you've convinced me that it's not 
actually necessary.

bq. I am leaning towards separating the two failure (low resources is not a 
failure though) scenarios, i.e. standby transitions to active irrespective of 
what its resource status is, and the check for resources is done independently 
once transition to active is successfully completed. This is consistent with 
the fact that low resources is not a failure, the cluster is still available in 
read only mode.

OK, that seems fine. Perhaps we could also have FSNS#startActiveServices 
interrupt the NameNodeResourceMonitor thread? That would guarantee that a 
resource check would happen promptly after transitioning to active.

Offline Todd pointed out to me that another thing we could do would be to check 
for having available resources in the monitorHealth RPC call, which the 
failover controller can call *before* initiating a failover, to make sure there 
are available resources on the NN which we want to failover to. That should 
probably be done in a separate JIRA, though.
                
> HA: Standby should not enter safemode when resources are low
> ------------------------------------------------------------
>
>                 Key: HDFS-2914
>                 URL: https://issues.apache.org/jira/browse/HDFS-2914
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude
>         Attachments: HDFS-2914-HDFS-1623, HDFS-2914-HDFS-1623, 
> HDFS-2914-HDFS-1623.patch, hdfs-2914
>
>
> When shared edits dir is bounced, standby NN is put into safemode by the 
> NameNodeResourceMonitor(). However, there is no path for it to exit out of 
> safe mode when shared edits dir reappears.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to