[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:
--------------------------------
    Attachment: HDFS-9129.002.patch

Thank you [~jingzhao]. The v2 patch addresses the comments.

Thank you [~daryn] for your input. I briefly illustrate the current design as 
follows. The patch is not very completed and further refactor may be necessary.

Basically, the patch is to split the name node safe mode to two levels. The 
first one is the {{FSNamesystem}} and the second one is 
{{BlockManagerSafeMode}}. The main code change is two parts:
# The first-level safe mode code is kept in  {{FSNamesystem}}
# The second-level safe mode is moved to {{blockmanagement}} package

At beginning, the name node is in *STARTUP* safe mode, where the block manager 
is tracking blocks and data nodes. The name node will leave *STARTUP* mode to
* *OFF*: if either of the two conditions is reached
*# The second level safe mode is *OFF*. This is the case that block manger 
leaves safe mode automatically once threshold and extension are met
*# administrator operates to leave safe mode manually
* *MANUALLY*: administrator operates to enter safe mode manually
* *RESOURCE_LOW*: resource low monitored

The first level safe mode is a simple state machine. Other transitions like 
*MANUALLY* to *OFF* is straight-forward.

As inferred from above, the second level is meaningful and valid if and only if 
the first level safe mode is in *STARTUP*. At beginning, the block manager is 
in *INITIALIZED* mode, and it will leave this mode if:
* thresholds are met (to *OFF*) mode as no extension is needed
* thresholds are not met (to *THRESHOLD* mode)

The *THRESHOLD* mode is pending on block and data node thresholds. If the 
thresholds are met, the block manager will leave this mode, and change to:
* *OFF* if extension is not needed (e.g. {{extension}} config value is 0)
* *EXTENSION* if extension is needed

The *EXTENSION* mode is pending on extension period. The block manager will 
leave this mode to *OFF* if the two conditions are reached:
* extension period is reached (checked by a monitor thread)
* thresholds are met

The main design motivation is that the {{FSNamesystem}} and {{BlockManager}} 
maintain their own states by themselves.

> Move the safemode block count into BlockManager
> -----------------------------------------------
>
>                 Key: HDFS-9129
>                 URL: https://issues.apache.org/jira/browse/HDFS-9129
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Haohui Mai
>            Assignee: Mingliang Liu
>         Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to