[ 
https://issues.apache.org/jira/browse/HDFS-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797248#comment-13797248
 ] 

Kihwal Lee commented on HDFS-5346:
----------------------------------

bq. We realized we can set dfs.namenode.replqueue.threshold-pct to 1.0 or even 
1.5 to make sure that only when the NN enters the Safemode extension period are 
the replication queues initialized.

Thanks for the analysis, Ravi. As you said, setting this config to something > 
1.0 will prevent the replication queues from being initialized in the middle of 
block report processing.  Since the main loop of SafeModeMonitor in 
trunk/branch-2 and leaveSafeMode() called by SafeModeMonitor in branch-0.23 are 
acquiring FSN lock, nothing will get in the way between replication queue 
initialization and leaving safe mode and cause delays. 

+1 The patch looks good.  I will change the title of this jira to reflect the 
actual change.

> Replication queues should not be initialized in the middle of IBR processing.
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-5346
>                 URL: https://issues.apache.org/jira/browse/HDFS-5346
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, performance
>    Affects Versions: 0.23.9, 2.3.0
>            Reporter: Kihwal Lee
>            Assignee: Ravi Prakash
>             Fix For: 2.3.0, 0.23.10
>
>         Attachments: HDFS-5346.branch-23.patch, HDFS-5346.branch-23.patch, 
> HDFS-5346.patch, HDFS-5346.patch, HDFS-5346.patch
>
>
> When initial block reports are being processed, checkMode() is called from 
> incrementSafeBlockCount(). This causes the replication queues to be 
> initialized in the middle of processing a block report in the IBR processing 
> mode. If there are many block reports waiting to be processed, 
> SafeModeMonitor won't be able to make name node leave the safe mode soon. It 
> appears that the block report processing speed degrades considerably during 
> this time. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to