[
https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Surendra Singh Lilhore reassigned HDFS-13768:
---------------------------------------------
Assignee: Surendra Singh Lilhore (was: Ranith Sardar)
> Adding replicas to volume map makes DataNode start slowly
> -----------------------------------------------------------
>
> Key: HDFS-13768
> URL: https://issues.apache.org/jira/browse/HDFS-13768
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.1.0
> Reporter: Yiqun Lin
> Assignee: Surendra Singh Lilhore
> Priority: Major
> Attachments: HDFS-13768.patch
>
>
> We find DN starting so slowly when rolling upgrade our cluster. When we
> restart DNs, the DNs start so slowly and not register to NN immediately. And
> this cause a lots of following error:
> {noformat}
> DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360
> dst: /xx.xx.xx.xx:50010
> java.io.IOException: Not ready to serve the block pool,
> BP-1508644862-xx.xx.xx.xx-1493781183457.
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looking into the logic of DN startup, it will do the initial block pool
> operation before the registration. And during initializing block pool
> operation, we found the adding replicas to volume map is the most expensive
> operation. Related log:
> {noformat}
> 2018-07-26 10:46:23,771 INFO [Thread-105]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/1/dfs/dn/current: 242722ms
> 2018-07-26 10:46:26,231 INFO [Thread-109]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/5/dfs/dn/current: 245182ms
> 2018-07-26 10:46:32,146 INFO [Thread-112]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/8/dfs/dn/current: 251097ms
> 2018-07-26 10:47:08,283 INFO [Thread-106]
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on
> volume /home/hard_disk/2/dfs/dn/current: 287235ms
> {noformat}
> Currently DN uses independent thread to scan and add replica for each volume,
> but we still need to wait the slowest thread to finish its work. So the main
> problem here is that we could make the thread to run faster.
> The jstack we get when DN blocking in the adding replica:
> {noformat}
> "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x00007f40879ff000 nid=0x145da
> runnable [0x00007f4043a38000]
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.list(Native Method)
> at java.io.File.list(File.java:1122)
> at java.io.File.listFiles(File.java:1207)
> at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)
> {noformat}
> One improvement maybe we can use ForkJoinPool to do this recursive task,
> rather than a sync way. This will be a great improvement because it can
> greatly speed up recovery process.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]