[
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205723#comment-13205723
]
Hari Mankude commented on HDFS-2914:
------------------------------------
bq. The patch file still needs to have the ".patch" extension.
done
bq. Rather than sleep for 10 seconds, let's increase the frequency which the
NNResourceChecker threads runs to every 0 or 1 seconds, and then sleep for 2
seconds.
I would rather leave this as is since I could easily make the problem happen
with 10s sleep.
bq. Our coding conventions require the use of curly braces ("{}") even for
single-line if statements.
done
bq. What do you think the behavior should be for an NN which is active,
experiences low resources, then becomes standby? I think the current behavior
seems fine (i.e. require the admin to make the now-standby NN leave SM) but I'm
wondering if you've considered this case. You might want to write a test case
which asserts the desired behavior.
I am not sure that I completely understand your concern. When active has low
resources, it goes into safemode. If shared edits goes away, then active dies.
If you are talking about doing a switchover (active to standby) when active is
in safemode, I thought I saw a test in testHAsafemode for this conditon. If
not, I can add a test in a seperate jira.
bq. Note that Jitendra's suggestion also said "When it transitions to
active, that's when a check for available resources to write logs should be
performed." I agree with this (much as the NN currently checks for available
resources on startup) but your patch doesn't implement this.
This is already handled in checkAvailableResources() being called during
startupCommonServices(). Also, resourcechecker thread is always running and it
will catch the issue in 5s.
> HA: Standby should not enter safemode when resources are low
> ------------------------------------------------------------
>
> Key: HDFS-2914
> URL: https://issues.apache.org/jira/browse/HDFS-2914
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Hari Mankude
> Assignee: Hari Mankude
> Attachments: HDFS-2914-HDFS-1623, HDFS-2914-HDFS-1623,
> HDFS-2914-HDFS-1623.patch, hdfs-2914
>
>
> When shared edits dir is bounced, standby NN is put into safemode by the
> NameNodeResourceMonitor(). However, there is no path for it to exit out of
> safe mode when shared edits dir reappears.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira