[
https://issues.apache.org/jira/browse/HDFS-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008572#comment-13008572
]
Matt Foley commented on HDFS-1767:
----------------------------------
The contrib tests (TestHFlush and TestFileConcurrentReader) and contrib tests
(hdfsproxy.TestAuthorizationFilter) cited seem unrelated to this patch.
Furthermore, the contrib tests pass in my environment. I think these are false
positives. Thanks.
> Delay second Block Reports until after cluster finishes startup, to improve
> startup times
> -----------------------------------------------------------------------------------------
>
> Key: HDFS-1767
> URL: https://issues.apache.org/jira/browse/HDFS-1767
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Affects Versions: 0.22.0
> Reporter: Matt Foley
> Assignee: Matt Foley
> Fix For: 0.23.0
>
> Attachments: DelaySecondBR_v1.patch, table.csv, table_tab.csv
>
>
> Consider a large cluster that takes 40 minutes to start up. The datanodes
> compete to register and send their Initial Block Reports (IBRs) as fast as
> they can after startup (subject to a small sub-two-minute random delay, which
> isn't relevant to this discussion).
> As each datanode succeeds in sending its IBR, it schedules the starting time
> for its regular cycle of reports, every hour (or other configured value of
> dfs.blockreport.intervalMsec). In order to spread the reports evenly across
> the block report interval, each datanode picks a random fraction of that
> interval, for the starting point of its regular report cycle. For example,
> if a particular datanode ends up randomly selecting 18 minutes after the
> hour, then that datanode will send a Block Report at 18 minutes after the
> hour every hour as long as it remains up. Other datanodes will start their
> cycles at other randomly selected times. This code is in
> DataNode.blockReport() and DataNode.scheduleBlockReport().
> The "second Block Report" (2BR), is the start of these hourly reports. The
> problem is that some of these 2BRs get scheduled sooner rather than later,
> and actually occur within the startup period. For example, if the cluster
> takes 40 minutes (2/3 of an hour) to start up, then out of the datanodes that
> succeed in sending their IBRs during the first 10 minutes, between 1/2 and
> 2/3 of them will send their 2BR before the 40-minute startup time has
> completed!
> 2BRs sent within the startup time actually compete with the remaining IBRs,
> and thereby slow down the overall startup process. This can be seen in the
> following data, which shows the startup process for a 3700-node cluster that
> took about 17 minutes to finish startup:
> {noformat}
> time starts sum regs sum IBR sum 2nd_BR sum total_BRs/min
> 0 1299799498 3042 3042 1969 1969 151 151 0 151
> 1 1299799558 665 3707 1470 3439 248 399 0 248
> 2 1299799618 3707 224 3663 270 669 0 270
> 3 1299799678 3707 14 3677 261 930 3 3 264
> 4 1299799738 3707 23 3700 288 1218 1 4 289
> 5 1299799798 3707 7 3707 258 1476 3 7 261
> 6 1299799858 3707 3707 317 1793 4 11 321
> 7 1299799918 3707 3707 292 2085 6 17 298
> 8 1299799978 3707 3707 292 2377 8 25 300
> 9 1299800038 3707 3707 272 2649 25 272
> 10 1299800098 3707 3707 280 2929 15 40 295
> 11 1299800158 3707 3707 223 3152 14 54 237
> 12 1299800218 3707 3707 143 3295 54 143
> 13 1299800278 3707 3707 141 3436 20 74 161
> 14 1299800338 3707 3707 195 3631 78 152 273
> 15 1299800398 3707 3707 51 3682 209 361 260
> 16 1299800458 3707 3707 25 3707 369 730 394
> 17 1299800518 3707 3707 3707 166 896 166
> 18 1299800578 3707 3707 3707 72 968 72
> 19 1299800638 3707 3707 3707 67 1035 67
> 20 1299800698 3707 3707 3707 75 1110 75
> 21 1299800758 3707 3707 3707 71 1181 71
> 22 1299800818 3707 3707 3707 67 1248 67
> 23 1299800878 3707 3707 3707 62 1310 62
> 24 1299800938 3707 3707 3707 56 1366 56
> 25 1299800998 3707 3707 3707 60 1426 60
> {noformat}
> This data was harvested from the startup logs of all the datanodes, and
> correlated into one-minute buckets. Each row of the table represents the
> progress during one elapsed minute of clock time. It seems that every
> cluster startup is different, but this one showed the effect fairly well.
> The "starts" column shows that all the nodes started up within the first 2
> minutes, and the "regs" column shows that all succeeded in registering by
> minute 6. The IBR column shows a sustained rate of Initial Block Report
> processing of 250-300/minute for the first 10 minutes.
> The question is why, during minutes 11 through 16, the rate of IBR processing
> slowed down. Why didn't the startup just finish? In the "2nd_BR" column, we
> see the rate of 2BRs ramping up as more datanodes complete their IBRs. As
> the rate increases, they become more effective at competing with the IBRs,
> and slow down the IBR processing even more. After the IBRs finally finish in
> minute 16, the rate of 2BRs settles down to a steady ~60-70/minute.
> In order to decrease competition for locks and other resources, to speed up
> IBR processing during startup, we propose to delay 2BRs until later into the
> cycle.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira