[ https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707772#comment-15707772 ]
Jingcheng Du commented on HDFS-9668: ------------------------------------ Thanks a lot for comments [~arpitagarwal]! bq. FsVolumeImpl.java:340: Not sure about this one. The documentation of BlockPoolSlice#decDfsUsed states It must be synchronized from caller. You are right, I thought about this when I did this. It is a AtomicLong, I thought it was okay to remove the synchronization from caller. If the synchronization is needed, I can add a mutex for these methods. bq. ReplicaMap.java:39: Can we postpone changing ReplicaMap to another Jira and keep this Jira for the FsDatasetImpl change only? I am afraid it is not possible. ReplicaMap needs a lock, which lock should be used if this class is not changed? A write lock would be dead-lock-prone if a read lock is held outside, or a non-related ReentrantLock as now? bq. FsDatasetImpl.java:1687: The read lock looks fine here but we don't seem to be respecting the comment on ReplicaMap#replicas which states. I think read lock is a kind of synchronization. The ReplicaMap#replicas directly returns a collection, the ConcurrentModificationException might occur when this collection is modified during the iteration. We use a write lock for the modification of ReplicaMap and add read lock for its read, I think it is safe here. bq. FsDatasetImpl.java:954: This should get the write lock. The modification in volumes is protected by the write lock in caller, and the ArrayList of volumes in this method is created per calling. And the comments in VolumeChoosingPolicy#chooseVolume is {code}The implementations of this interface must be thread-safe{code}. I think it is okay to use a read lock here? > Optimize the locking in FsDatasetImpl > ------------------------------------- > > Key: HDFS-9668 > URL: https://issues.apache.org/jira/browse/HDFS-9668 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Reporter: Jingcheng Du > Assignee: Jingcheng Du > Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, > HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, > HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, > HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, > HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, > HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, > HDFS-9668-23.patch, HDFS-9668-23.patch, HDFS-9668-24.patch, > HDFS-9668-25.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, > HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, > execution_time.png > > > During the HBase test on a tiered storage of HDFS (WAL is stored in > SSD/RAMDISK, and all other files are stored in HDD), we observe many > long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part > of the jstack result: > {noformat} > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48521 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread > t@93336 > java.lang.Thread.State: BLOCKED > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1111) > - waiting to lock <18324c9> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48520 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - None > > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48520 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread > t@93335 > java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:1012) > at > org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140) > - locked <18324c9> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - None > {noformat} > We measured the execution of some operations in FsDatasetImpl during the > test. Here following is the result. > !execution_time.png! > The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy > load take a really long time. > It means one slow operation of finalizeBlock, addBlock and createRbw in a > slow storage can block all the other same operations in the same DataNode, > especially in HBase when many wal/flusher/compactor are configured. > We need a finer grained lock mechanism in a new FsDatasetImpl implementation > and users can choose the implementation by configuring > "dfs.datanode.fsdataset.factory" in DataNode. > We can implement the lock by either storage level or block-level. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org