[
https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628977#comment-16628977
]
Hadoop QA commented on HDFS-13768:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 1 new or modified test
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
11m 33s{color} | {color:green} branch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
56s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m
0s{color} | {color:red} The patch 2 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
10m 45s{color} | {color:green} patch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}126m 23s{color}
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
33s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}193m 33s{color} |
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
| | hadoop.hdfs.server.namenode.sps.TestBlockStorageMovementAttemptedItems |
| | hadoop.hdfs.TestLeaseRecovery2 |
| | hadoop.hdfs.client.impl.TestBlockReaderLocal |
| | hadoop.hdfs.web.TestWebHdfsTimeouts |
| | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | HDFS-13768 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12941381/HDFS-13768.05.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient findbugs checkstyle xml |
| uname | Linux eead017d7a2c 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e5287a4 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| whitespace |
https://builds.apache.org/job/PreCommit-HDFS-Build/25140/artifact/out/whitespace-tabs.txt
|
| unit |
https://builds.apache.org/job/PreCommit-HDFS-Build/25140/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HDFS-Build/25140/testReport/ |
| Max. process+thread count | 3088 (vs. ulimit of 10000) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/25140/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> Adding replicas to volume map makes DataNode start slowly
> -----------------------------------------------------------
>
> Key: HDFS-13768
> URL: https://issues.apache.org/jira/browse/HDFS-13768
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.1.0
> Reporter: Yiqun Lin
> Assignee: Surendra Singh Lilhore
> Priority: Major
> Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch,
> HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch,
> HDFS-13768.patch, screenshot-1.png
>
>
> We find DN starting so slowly when rolling upgrade our cluster. When we
> restart DNs, the DNs start so slowly and not register to NN immediately. And
> this cause a lots of following error:
> {noformat}
> DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360
> dst: /xx.xx.xx.xx:50010
> java.io.IOException: Not ready to serve the block pool,
> BP-1508644862-xx.xx.xx.xx-1493781183457.
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looking into the logic of DN startup, it will do the initial block pool
> operation before the registration. And during initializing block pool
> operation, we found the adding replicas to volume map is the most expensive
> operation. Related log:
> {noformat}
> 2018-07-26 10:46:23,771 INFO [Thread-105]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/1/dfs/dn/current: 242722ms
> 2018-07-26 10:46:26,231 INFO [Thread-109]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/5/dfs/dn/current: 245182ms
> 2018-07-26 10:46:32,146 INFO [Thread-112]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/8/dfs/dn/current: 251097ms
> 2018-07-26 10:47:08,283 INFO [Thread-106]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/2/dfs/dn/current: 287235ms
> {noformat}
> Currently DN uses independent thread to scan and add replica for each volume,
> but we still need to wait the slowest thread to finish its work. So the main
> problem here is that we could make the thread to run faster.
> The jstack we get when DN blocking in the adding replica:
> {noformat}
> "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x00007f40879ff000 nid=0x145da
> runnable [0x00007f4043a38000]
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.list(Native Method)
> at java.io.File.list(File.java:1122)
> at java.io.File.listFiles(File.java:1207)
> at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)
> {noformat}
> One improvement maybe we can use ForkJoinPool to do this recursive task,
> rather than a sync way. This will be a great improvement because it can
> greatly speed up recovery process.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]