[
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967881#comment-14967881
]
Haohui Mai edited comment on HDFS-9129 at 10/21/15 8:59 PM:
------------------------------------------------------------
Thanks for the work, [~liuml07]
{code}
+ if (status == BMSafeModeStatus.OFF) {
+ return;
+ }
+ if (status == BMSafeModeStatus.OFF) {
+ return;
+ }
{code}
There are multiple cases like the above. They should be asserts.
{code}
+ synchronized boolean isSafeModeTrackingBlocks() {
+ return namesystem.isHaEnabled() && shouldIncrementallyTrackBlocks;
+ }
{code}
The code can be simplified. {{namesystem.isHaEnabled()}} does not change in the
lifecycle of the process.
{code}
+ /**
+ * The transition trigger of the safe mode state machine.
+ *
+ * The state machine elaborates itself in code. Specially the start status is
+ * always INITIALIZED and the end status is OFF. There is no transition to
+ * INITIALIZED and no transition from OFF.
+ */
{code}
It's better to document the conditions of state transition.
{code}
+ needExtension = extension > 0 &&
+ (blockThreshold > 0 || datanodeThreshold > 0);
{code}
This can be moved under the THRESHOLD statement and become a local variable.
{code}
+ if (areThresholdsMet()) {
+ if (needExtension) {
+ // THRESHOLD -> EXTENSION
+ status = BMSafeModeStatus.EXTENSION;
+ reached = monotonicNow();
+ NameNode.stateChangeLog.info("STATE* Safe mode extension entered. \n"
+ + getSafeModeTip());
+ smmthread = new Daemon(new SafeModeMonitor());
+ smmthread.start();
+ } else {
+ // THRESHOLD -> OFF
+ leaveSafeMode();
+ }
+ }
+ initializeReplQueuesIfNecessary();
+ break;
+ case EXTENSION:
+ if (areThresholdsMet() && monotonicNow() - reached >= extension) {
+ // EXTENSION -> OFF
+ leaveSafeMode();
+ }
+ break;
{code}
{{initializeReplQueuesIfNecessary()}} should be called only once.
{code}
+ blockManager.leaveSafeMode();
+ safeModeStatus = SafeModeStatus.OFF;
+ startSecretManagerIfNecessary();
{code}
{{safeModeStatus = SafeModeStatus.OFF}} should be moved to
{{BlockManagerSafeMode}}.
{code}
+ private class SafeModeMonitor implements Runnable {
+ /** interval in msec for checking safe mode: {@value}. */
+ private static final long RECHECK_INTERVAL = 1000;
+
+ @Override
+ public void run() {
+ while (namesystem.isRunning()) {
+ try {
+ namesystem.writeLock();
+ if (status == BMSafeModeStatus.OFF) {
+ // We're not in start up safe mode any more while monitoring.
+ // Chances are we left safe mode manually before thresholds and
+ // extension are reached.
+ break;
+ }
+ checkSafeMode();
+ if (!isInSafeMode()) {
+ smmthread = null;
+ break;
+ }
+ } finally {
+ namesystem.writeUnlock();
+ }
+
+ try {
+ Thread.sleep(RECHECK_INTERVAL);
+ } catch (InterruptedException ignored) {
+ }
+ }
+
+ if (!namesystem.isRunning()) {
+ LOG.info("NameNode is being shutdown, exit SafeModeMonitor thread");
+ }
+ }
+ }
{code}
A cleaner approach is to put the reached timestamp into the constructor of
{{SafeModeMonitor()}}.
It might be good to have additional unit tests for BlockManagerSafeMode.
was (Author: wheat9):
Thanks for the work, [~liuml07]
{code}
+ if (status == BMSafeModeStatus.OFF) {
+ return;
+ }
+ if (status == BMSafeModeStatus.OFF) {
+ return;
+ }
{code}
There are multiple cases like the above. They should be asserts.
{code}
+ synchronized boolean isSafeModeTrackingBlocks() {
+ return namesystem.isHaEnabled() && shouldIncrementallyTrackBlocks;
+ }
{code}
The code can be simplified. {{namesystem.isHaEnabled()}} does not change in the
lifecycle of the process.
{code}
+ /**
+ * The transition trigger of the safe mode state machine.
+ *
+ * The state machine elaborates itself in code. Specially the start status is
+ * always INITIALIZED and the end status is OFF. There is no transition to
+ * INITIALIZED and no transition from OFF.
+ */
{code}
It's better to document the conditions of state transition.
{code}
+ needExtension = extension > 0 &&
+ (blockThreshold > 0 || datanodeThreshold > 0);
{code}
This can be moved under the THRESHOLD statement and become a local variable.
{code}
+ if (areThresholdsMet()) {
+ if (needExtension) {
+ // THRESHOLD -> EXTENSION
+ status = BMSafeModeStatus.EXTENSION;
+ reached = monotonicNow();
+ NameNode.stateChangeLog.info("STATE* Safe mode extension entered. \n"
+ + getSafeModeTip());
+ smmthread = new Daemon(new SafeModeMonitor());
+ smmthread.start();
+ } else {
+ // THRESHOLD -> OFF
+ leaveSafeMode();
+ }
+ }
+ initializeReplQueuesIfNecessary();
+ break;
+ case EXTENSION:
+ if (areThresholdsMet() && monotonicNow() - reached >= extension) {
+ // EXTENSION -> OFF
+ leaveSafeMode();
+ }
+ break;
{code}
{{initializeReplQueuesIfNecessary()}} should be called only once.
{code}
+ blockManager.leaveSafeMode();
+ safeModeStatus = SafeModeStatus.OFF;
+ startSecretManagerIfNecessary();
{code}
{{safeModeStatus = SafeModeStatus.OFF}} should be moved to
{{BlockManagerSafeMode}}.
{code}
+ private class SafeModeMonitor implements Runnable {
+ /** interval in msec for checking safe mode: {@value}. */
+ private static final long RECHECK_INTERVAL = 1000;
+
+ @Override
+ public void run() {
+ while (namesystem.isRunning()) {
+ try {
+ namesystem.writeLock();
+ if (status == BMSafeModeStatus.OFF) {
+ // We're not in start up safe mode any more while monitoring.
+ // Chances are we left safe mode manually before thresholds and
+ // extension are reached.
+ break;
+ }
+ checkSafeMode();
+ if (!isInSafeMode()) {
+ smmthread = null;
+ break;
+ }
+ } finally {
+ namesystem.writeUnlock();
+ }
+
+ try {
+ Thread.sleep(RECHECK_INTERVAL);
+ } catch (InterruptedException ignored) {
+ }
+ }
+
+ if (!namesystem.isRunning()) {
+ LOG.info("NameNode is being shutdown, exit SafeModeMonitor thread");
+ }
+ }
+ }
{code}
A cleaner approach is to put the reached timestamp into the constructor of
{{SafeModeMonitor()}}, and there is no need to hold the namesystem lock anymore.
> Move the safemode block count into BlockManager
> -----------------------------------------------
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Haohui Mai
> Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch,
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch,
> HDFS-9129.005.patch, HDFS-9129.006.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the
> NN can get out of the safemode. These fields can moved to the
> {{BlockManager}} class.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)