[
https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624974#comment-16624974
]
Hadoop QA commented on HDFS-13768:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 1 new or modified test
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
13m 7s{color} | {color:green} branch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
47s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
58s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git
apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
12m 6s{color} | {color:green} patch has no errors when building and testing
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m
3s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 7s{color}
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
30s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}149m 20s{color} |
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
| | Possible doublecheck on
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addReplicaThreadPool
in new
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice(String,
FsVolumeImpl, File, Configuration, Timer) At BlockPoolSlice.java:new
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice(String,
FsVolumeImpl, File, Configuration, Timer) At BlockPoolSlice.java:[lines
186-188] |
| Failed junit tests | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized |
| | hadoop.hdfs.TestLeaseRecovery2 |
| | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | HDFS-13768 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12940946/HDFS-13768.04.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient findbugs checkstyle xml |
| uname | Linux 95ea0e322dae 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4758b4b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| whitespace |
https://builds.apache.org/job/PreCommit-HDFS-Build/25132/artifact/out/whitespace-eol.txt
|
| findbugs |
https://builds.apache.org/job/PreCommit-HDFS-Build/25132/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
|
| unit |
https://builds.apache.org/job/PreCommit-HDFS-Build/25132/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HDFS-Build/25132/testReport/ |
| Max. process+thread count | 2979 (vs. ulimit of 10000) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/25132/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> Adding replicas to volume map makes DataNode start slowly
> -----------------------------------------------------------
>
> Key: HDFS-13768
> URL: https://issues.apache.org/jira/browse/HDFS-13768
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.1.0
> Reporter: Yiqun Lin
> Assignee: Surendra Singh Lilhore
> Priority: Major
> Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch,
> HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.patch, screenshot-1.png
>
>
> We find DN starting so slowly when rolling upgrade our cluster. When we
> restart DNs, the DNs start so slowly and not register to NN immediately. And
> this cause a lots of following error:
> {noformat}
> DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360
> dst: /xx.xx.xx.xx:50010
> java.io.IOException: Not ready to serve the block pool,
> BP-1508644862-xx.xx.xx.xx-1493781183457.
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looking into the logic of DN startup, it will do the initial block pool
> operation before the registration. And during initializing block pool
> operation, we found the adding replicas to volume map is the most expensive
> operation. Related log:
> {noformat}
> 2018-07-26 10:46:23,771 INFO [Thread-105]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/1/dfs/dn/current: 242722ms
> 2018-07-26 10:46:26,231 INFO [Thread-109]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/5/dfs/dn/current: 245182ms
> 2018-07-26 10:46:32,146 INFO [Thread-112]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/8/dfs/dn/current: 251097ms
> 2018-07-26 10:47:08,283 INFO [Thread-106]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/2/dfs/dn/current: 287235ms
> {noformat}
> Currently DN uses independent thread to scan and add replica for each volume,
> but we still need to wait the slowest thread to finish its work. So the main
> problem here is that we could make the thread to run faster.
> The jstack we get when DN blocking in the adding replica:
> {noformat}
> "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x00007f40879ff000 nid=0x145da
> runnable [0x00007f4043a38000]
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.list(Native Method)
> at java.io.File.list(File.java:1122)
> at java.io.File.listFiles(File.java:1207)
> at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)
> {noformat}
> One improvement maybe we can use ForkJoinPool to do this recursive task,
> rather than a sync way. This will be a great improvement because it can
> greatly speed up recovery process.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]