[ 
https://issues.apache.org/jira/browse/HDFS-11090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630365#comment-15630365
 ] 

Andrew Wang commented on HDFS-11090:
------------------------------------

Thanks for the comments everyone!

To provide a little more context, this is something we ran into for an 
ephemeral cluster usecase. We're starting a new cluster for the first time. The 
% blocks threshold is the default, and the min datanodes threshold is 1. Our 
management scripts wait for the NN to leave safemode before setting up 
directories and populating HDFS with files like Oozie's sharelib. This is why 
the min datanodes threshold is set to 1, this way the cluster is ready to 
receive writes when it leaves safemode.

Even though there are no blocks in the cluster, since the min datanode 
threshold is set, the Namenode enters safemode extension. This adds an 
additional 30s to startup. We've already trivially achieved 100% of all 
replicas being reported, so in this case I'd like to leave safemode as soon as 
the min datanodes threshold is met.

Setting the safemode extension to 0 for the first run would work, but that 
pushes additional configuration burden onto the user. I was hoping to avoid 
that with this JIRA.

> Leave safemode immediately if all blocks have reported in
> ---------------------------------------------------------
>
>                 Key: HDFS-11090
>                 URL: https://issues.apache.org/jira/browse/HDFS-11090
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.7.3
>            Reporter: Andrew Wang
>            Assignee: Yiqun Lin
>         Attachments: HDFS-11090.001.patch
>
>
> Startup safemode is triggered by two thresholds: % blocks reported in, and 
> min # datanodes. It's extended by an interval (default 30s) until these two 
> thresholds are met.
> Safemode extension is helpful when the cluster has data, and the default % 
> blocks threshold (0.99) is used. It gives DNs a little extra time to report 
> in and thus avoid unnecessary replication work.
> However, we can leave startup safemode early if 100% of blocks have reported 
> in.
> Note that operators sometimes change the % blocks threshold to > 1 to never 
> automatically leave safemode. We should maintain this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to