[
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435803#comment-15435803
]
Lei (Eddy) Xu commented on HDFS-9668:
-------------------------------------
Hi, [[email protected]] Thanks a lot for the patch. It looks nice
overall.
* {{AutoCloseableLock acquireDatasetLock(boolean readLock);}}. Would it be
more clear to split it into two methods {{acquireReadLock()}} and
{{acquireWriteLock()}}. From the caller aspect, it makes the code self
explained.
* In {{FsDatasetImpl#getStoredBlock()}}. Could you explain what does
{{blockOpLock}} protect? IMO, {{datasetReadLock}} does not need to proecte
{{findMetadataFile()}} and {{parseGenerationStamp()}}. What if we do the
following:
{code}
File blockfile = null
try (AutoCloseableLock lock = datasetReadLock.acquire()) {
synchronized (getBlockOpLock(blkid)) {
blockfile = getFile(bpid, blkid, false);
}
}
if blockFile == null {
return null
}
final File metafile = ....
{code}
Similarly, in {{getTmpInputStreams}}, the {{datasetReadLock}} and
{{blockOpLock}} should only protect {{getReplicaInfo()}}, instead of several
{{openAndSeek()}} calls.
Btw, {{FsVolumeReference}} is {{AutoClosable}} that can be used into
{{try-finally-resources}} as well.
* In {{private FsDatasetImpl#append()}}, you need the write lock to run
{code}
1311 volumeMap.add(bpid, newReplicaInfo);
{code}
Also, you might want to add a comment for {{append()}} that the caller must
hold {{blockOpLock}}.
* Similarly, we do not need read locks in {{recoverAppend()}} and
{{recoverClose()}} after calling {{recoverCheck()}}.
In summary, in your write-heavy workloads, the write requests need to acquire
{{datasetWriteLock}} to update {{volumeMap}}. As this patch using fair
read/write locks, the duration of {{readLock}} should be as short as possible
to allow write locks being acquired more frequently. On the other hand, since
the changes on {{block / blockFile}} can be protected by {{blockOpLock}}, it
seems to me that there is no need to hold dataset (read/write) locks when
manipulating the blocks (i.g., bump genstamp). What do you think,
[[email protected]]?
> Optimize the locking in FsDatasetImpl
> -------------------------------------
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch,
> HDFS-9668-4.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in
> SSD/RAMDISK, and all other files are stored in HDD), we observe many
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at
> /192.168.50.16:48521 [Receiving block
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread
> t@93336
> java.lang.Thread.State: BLOCKED
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1111)
> - waiting to lock <18324c9> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at
> /192.168.50.16:48520 [Receiving block
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:745)
> Locked ownable synchronizers:
> - None
>
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at
> /192.168.50.16:48520 [Receiving block
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread
> t@93335
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:1012)
> at
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
> - locked <18324c9> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:745)
> Locked ownable synchronizers:
> - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a
> slow storage can block all the other same operations in the same DataNode,
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation
> and users can choose the implementation by configuring
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]