[
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436247#comment-15436247
]
Jingcheng Du commented on HDFS-9668:
------------------------------------
Thanks a lot for the comments! [~eddyxu]
bq. AutoCloseableLock acquireDatasetLock(boolean readLock);. Would it be more
clear to split it into two methods acquireReadLock() and acquireWriteLock().
From the caller aspect, it makes the code self explained.
I defined the APIs in the way of HDFS-10682. Yes, I can split it to two
methods. Do you think we should retain the method acquireDatasetLock() which
has no parameters?
bq. In FsDatasetImpl#getStoredBlock(). Could you explain what does blockOpLock
protect? IMO, datasetReadLock does not need to proecte findMetadataFile() and
parseGenerationStamp(). What if we do the following:
I tried to replace the old locks with new locks directly to avoid potential
issues and concerns in reviews:) and plan to do more refinements step by step
according to comments. You are right, we can move the file parsing operations
out of the lock scope. I will do that in the next patch.
bq. Similarly, in getTmpInputStreams, the datasetReadLock and blockOpLock
should only protect getReplicaInfo(), instead of several openAndSeek() calls.
Btw, FsVolumeReference is AutoClosable that can be used into
try-finally-resources as well.
Right, I will do it in the next patch.
bq. In private FsDatasetImpl#append(), you need the write lock to run
In {{volumeMap.add(bpid, newReplicaInfo);}}, it has its own synchronization
mutex in methods. I think it is okay to a read lock here?
bq. In summary, in your write-heavy workloads, the write requests need to
acquire datasetWriteLock to update volumeMap. ... since the changes on block /
blockFile can be protected by blockOpLock, it seems to me that there is no need
to hold dataset (read/write) locks when manipulating the blocks (i.g., bump
genstamp)
The read/write lock is used to synchronize the operations between the volume
operations and block operations to avoid the race condition when block
operations and adding/removing volume operations happen concurrently. So I have
to retain the read locks even if it is to manipulate the blocks? But yes, some
read-only operations can be moved out of the lock scope.
> Optimize the locking in FsDatasetImpl
> -------------------------------------
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch,
> HDFS-9668-4.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in
> SSD/RAMDISK, and all other files are stored in HDD), we observe many
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at
> /192.168.50.16:48521 [Receiving block
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread
> t@93336
> java.lang.Thread.State: BLOCKED
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1111)
> - waiting to lock <18324c9> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at
> /192.168.50.16:48520 [Receiving block
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:745)
> Locked ownable synchronizers:
> - None
>
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at
> /192.168.50.16:48520 [Receiving block
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread
> t@93335
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:1012)
> at
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
> - locked <18324c9> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:745)
> Locked ownable synchronizers:
> - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a
> slow storage can block all the other same operations in the same DataNode,
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation
> and users can choose the implementation by configuring
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]