[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly

Yiqun Lin (JIRA) Mon, 24 Sep 2018 05:18:07 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625737#comment-16625737
 ]


Yiqun Lin commented on HDFS-13768:
----------------------------------

Thanks [~surendrasingh] for updating the patch! Some minor comments from me:

*ReplicaMap#addAndGet*
 For the new method {{addAndGet}}, I would prefer to return a boolean type 
value to see if new replicaInfo was added. For example, if old replicainfo 
exists, we return false. And then we don't need to check the reference of 
returned replicaInfo in {{BlockPoolSlice#addReplicaToReplicasMap}}, it looks a 
little confused.

*FsDatasetUtil#getGenerationStampFromFile*
 * Following loop seems can be deleted if I am understanding correctly. Since 
list files were ordered, the meta file only will be the next file if it really 
exists.
{code:java}
    for (int j = 0; j < listdir.length; j++) {
      String path = listdir[j].getName();
      if (!path.startsWith(blockName)) {
        continue;
      }
      if (blockFile.getCanonicalPath().equals(listdir[j].getCanonicalPath())) {
        continue;
      }
      return Block.getGenerationStamp(listdir[j].getName());
    }
{code}

 * Please complete the javadoc for this method.

Other minor comments
* FsVolumeList.java: There need a white space after 'pool'.
* hdfs-default.xml: The description of the this config should be updated since 
we use max of (volume * number of bp_service,  number of processor) actually. 
Not always the value we configured.
* TestFsVolumeList.java
1. We can use {{Configuration#setInt}} to set {{DFS_REPLICATION_KEY}}.
2.I prefer to explicitly reset the thread pool size (e.g. 5 or 10) for ensure 
making adding replica behaviour concurrently running in ForkJoinPool.
* Please fix findbugs error and white space warning happened in line 
{{DFSConfigKeys.java:368}}.

>  Adding replicas to volume map makes DataNode start slowly 
> -----------------------------------------------------------
>
>                 Key: HDFS-13768
>                 URL: https://issues.apache.org/jira/browse/HDFS-13768
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Yiqun Lin
>            Assignee: Surendra Singh Lilhore
>            Priority: Major
>         Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, 
> HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.patch, screenshot-1.png
>
>
> We find DN starting so slowly when rolling upgrade our cluster. When we 
> restart DNs, the DNs start so slowly and not register to NN immediately. And 
> this cause a lots of following error:
> {noformat}
> DataXceiver error processing WRITE_BLOCK operation  src: /xx.xx.xx.xx:64360 
> dst: /xx.xx.xx.xx:50010
> java.io.IOException: Not ready to serve the block pool, 
> BP-1508644862-xx.xx.xx.xx-1493781183457.
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looking into the logic of DN startup, it will do the initial block pool 
> operation before the registration. And during initializing block pool 
> operation, we found the adding replicas to volume map is the most expensive 
> operation.  Related log:
> {noformat}
> 2018-07-26 10:46:23,771 INFO [Thread-105] 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to 
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on 
> volume /home/hard_disk/1/dfs/dn/current: 242722ms
> 2018-07-26 10:46:26,231 INFO [Thread-109] 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to 
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on 
> volume /home/hard_disk/5/dfs/dn/current: 245182ms
> 2018-07-26 10:46:32,146 INFO [Thread-112] 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to 
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on 
> volume /home/hard_disk/8/dfs/dn/current: 251097ms
> 2018-07-26 10:47:08,283 INFO [Thread-106] 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to 
> add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on 
> volume /home/hard_disk/2/dfs/dn/current: 287235ms
> {noformat}
> Currently DN uses independent thread to scan and add replica for each volume, 
> but we still need to wait the slowest thread to finish its work. So the main 
> problem here is that we could make the thread to run faster.
> The jstack we get when DN blocking in the adding replica:
> {noformat}
> "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x00007f40879ff000 nid=0x145da 
> runnable [0x00007f4043a38000]
>    java.lang.Thread.State: RUNNABLE
>       at java.io.UnixFileSystem.list(Native Method)
>       at java.io.File.list(File.java:1122)
>       at java.io.File.listFiles(File.java:1207)
>       at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)
> {noformat}
> One improvement maybe we can use ForkJoinPool to do this recursive task, 
> rather than a sync way. This will be a great improvement because it can 
> greatly speed up recovery process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly

Reply via email to