[
https://issues.apache.org/jira/browse/HDFS-11090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629948#comment-15629948
]
Jing Zhao commented on HDFS-11090:
----------------------------------
If 100% block threshold has been met, this means all the blocks have achieved
minimum replication requirement (usually 1 replica). Therefore it is still
possible that NN has not received some FBR. To have safemode extension can
still avoid unnecessary replication work. But in the meanwhile, the number of
pending FBR in the above scenario should be limited, considering we're using
random replication. Also we already have extra logic to initialize replication
queue earlier ({{initializeReplQueuesIfNecessary}}).
My main concern about the approach in the current patch is whether it is that
useful in practice. For a large cluster, it is not rare to have a few missing
blocks, or at least we have to wait for a long time to have 100% block safe,
thus ppl usually set safemode threshold <1. For a small cluster, we can
directly set the safemode extension to 0 in the configuration. So do we want to
add some extra check to the safemode code which is already very complicated?
> Leave safemode immediately if all blocks have reported in
> ---------------------------------------------------------
>
> Key: HDFS-11090
> URL: https://issues.apache.org/jira/browse/HDFS-11090
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Affects Versions: 2.7.3
> Reporter: Andrew Wang
> Assignee: Yiqun Lin
> Attachments: HDFS-11090.001.patch
>
>
> Startup safemode is triggered by two thresholds: % blocks reported in, and
> min # datanodes. It's extended by an interval (default 30s) until these two
> thresholds are met.
> Safemode extension is helpful when the cluster has data, and the default %
> blocks threshold (0.99) is used. It gives DNs a little extra time to report
> in and thus avoid unnecessary replication work.
> However, we can leave startup safemode early if 100% of blocks have reported
> in.
> Note that operators sometimes change the % blocks threshold to > 1 to never
> automatically leave safemode. We should maintain this behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]