[
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194793#comment-15194793
]
Jingcheng Du commented on HDFS-9668:
------------------------------------
Thanks for the comments, [~cmccabe].
bq. A 10 gigabyte HDFS file that uses 5 MB HDFS blocks seems like an extremely
unusual case. That would result in just that single file having 2,097,152
blocks. I guess perhaps this is intended to simulate a case where we have many
small files leading to small blocks?
Right, I want to simulate a case where there are many small files leading to
small blocks. I can re-design the test cases.
bq. The only thing that needs to be protected by the lock is the call to
FsDatasetImpl#getFile, since it reads from the volumeMap.
FsDatasetUtil#findMetaFile doesn't need protection since it just lists the
block files in the directory, and parseGenerationStamp just applies a regular
expression to the metadata file name.
Will this break the consistency between the block file and meta file? Is that
possible a block file cannot find its meta file if the findMetaFile is not
protected by locks?
bq. There are a lot of other cases like this. I think reducing the unnecessary
locking would be better than making the locking more complex. After all, even
with lock striping, we may find that several "hot" blocks share the same lock
stripe, and therefore that we gain no more concurrency. I wonder what numbers
you get if you just change these functions to drop the lock except when they
really need it to access the volumeMap?
I can remove unnecessary locks for the code and test what the number is at that
time.
But still it is hard to remove the locks in createRbw, etc where the long-time
blocking occur. I think this is what we have to tackle in the future.
> Optimize the locking in FsDatasetImpl
> -------------------------------------
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in
> SSD/RAMDISK, and all other files are stored in HDD), we observe many
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at
> /192.168.50.16:48521 [Receiving block
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread
> t@93336
> java.lang.Thread.State: BLOCKED
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1111)
> - waiting to lock <18324c9> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at
> /192.168.50.16:48520 [Receiving block
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:745)
> Locked ownable synchronizers:
> - None
>
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at
> /192.168.50.16:48520 [Receiving block
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread
> t@93335
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:1012)
> at
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
> - locked <18324c9> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:745)
> Locked ownable synchronizers:
> - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a
> slow storage can block all the other same operations in the same DataNode,
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation
> and users can choose the implementation by configuring
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)