[
https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797248#comment-13797248
]
Kihwal Lee commented on HDFS-5346:
----------------------------------
bq. We realized we can set dfs.namenode.replqueue.threshold-pct to 1.0 or even
1.5 to make sure that only when the NN enters the Safemode extension period are
the replication queues initialized.
Thanks for the analysis, Ravi. As you said, setting this config to something >
1.0 will prevent the replication queues from being initialized in the middle of
block report processing. Since the main loop of SafeModeMonitor in
trunk/branch-2 and leaveSafeMode() called by SafeModeMonitor in branch-0.23 are
acquiring FSN lock, nothing will get in the way between replication queue
initialization and leaving safe mode and cause delays.
+1 The patch looks good. I will change the title of this jira to reflect the
actual change.
> Replication queues should not be initialized in the middle of IBR processing.
> -----------------------------------------------------------------------------
>
> Key: HDFS-5346
> URL: https://issues.apache.org/jira/browse/HDFS-5346
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode, performance
> Affects Versions: 0.23.9, 2.3.0
> Reporter: Kihwal Lee
> Assignee: Ravi Prakash
> Fix For: 2.3.0, 0.23.10
>
> Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch,
> HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch
>
>
> When initial block reports are being processed, checkMode() is called from
> incrementSafeBlockCount(). This causes the replication queues to be
> initialized in the middle of processing a block report in the IBR processing
> mode. If there are many block reports waiting to be processed,
> SafeModeMonitor won't be able to make name node leave the safe mode soon. It
> appears that the block report processing speed degrades considerably during
> this time.
--
This message was sent by Atlassian JIRA
(v6.1#6144)