[
https://issues.apache.org/jira/browse/HDFS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392672#comment-14392672
]
Xinwei Qin commented on HDFS-7060:
-----------------------------------
[~cmccabe] , [~wheat9] Thanks a lot for your comments.
We create 400 threads to write 20 million files to a Hadoop cluster with 3 DNs.
Every DN has already about 90 million blocks.
Without HDFS-7060 patch, heartbeat always delays about several hundreds of
seconds or even longer to make DN dead.
With HDFS-7060 patch, heartbeat only sometimes delays about 50s and the DN
never dead.
As can be seen from above, making the DN heartbeat lockless can reduce the
delay time of heartbeat significantly, especially in large-scale concurrent
read and write scenario. So, if the lock is unnecessary, removing it will be a
good idea.
> Avoid taking locks when sending heartbeats from the DataNode
> ------------------------------------------------------------
>
> Key: HDFS-7060
> URL: https://issues.apache.org/jira/browse/HDFS-7060
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Haohui Mai
> Assignee: Xinwei Qin
> Attachments: HDFS-7060-002.patch, HDFS-7060.000.patch,
> HDFS-7060.001.patch
>
>
> We're seeing the heartbeat is blocked by the monitor of {{FsDatasetImpl}}
> when the DN is under heavy load of writes:
> {noformat}
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:115)
> - waiting to lock <0x0000000780304fb8> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:91)
> - locked <0x0000000780612fd8> (a java.lang.Object)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:563)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:668)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:827)
> at java.lang.Thread.run(Thread.java:744)
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:743)
> - waiting to lock <0x0000000780304fb8> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:169)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
> at java.lang.Thread.run(Thread.java:744)
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:1006)
> at
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:59)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:244)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:195)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:753)
> - locked <0x0000000780304fb8> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:169)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)