[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192871#comment-15192871
 ] 

Jingcheng Du commented on HDFS-9668:
------------------------------------

In the patch V2, I choose an array-based lock pool mechanism instead of 
ConcurrentHashMap. Alternatively I can abstract the lock pool and make it 
implementable and configurable by users.
Please take a look, thanks a lot.

Supplement some performance data in DFS I/O that ran in JDK8.
10-10G means, the number of files is 10, and each file is 10GB.
If HDDs are installed in each DN, and the size of data block is 5MB, the 
throughput (MB/s) is listed following.
|  number of threads       | without patch     | with patch          |
| ----------------------------- |:--------------------:| --------------------:|
|             10-10G             | 79.49                 |   88.25              
 |
|             50-2G               | 16.35                 |   23.6              
   |
|             100-2G             | 5.77                   |   9.23              
   |
|             200-1G             | 2.87                   |   4.45              
   |

If HDDs are installed in each DN, and the size of data block is 128MB, the 
throughput is listed following
|  number of threads       | without patch     | with patch          |
| ----------------------------- |:--------------------:| --------------------:|
|             10-10G             | 122.78               |  133.69              |
|             50-2G               | 30.96                 |   33.42             
  |
|             100-2G             | 12.41                 |   12.88              
 |
|             200-1G             | 7.49                   |   7.68              
   |

If SSDs are installed in each DN, and the size of data block is 5MB, the 
throughput is listed following
|  number of threads       | without patch     | with patch          |
| ----------------------------- |:--------------------:| --------------------:|
|             10-10G             | 98.06                 |  105.2               
 |
|             50-2G               | 26.92                 |   28.39             
  |
|             100-2G             | 10.8                   |   14.06             
  |
|             200-1G             | 4.87                   |   6.39              
   |

If SSDs are installed in each DN, and the size of data block is 128MB, the 
throughput is listed following
|  number of threads       | without patch     | with patch          |
| ----------------------------- |:--------------------:| --------------------:|
|             10-10G             | 111.85               |  119.46              |
|             50-2G               | 35.57                 |   36.42             
  |
|             100-2G             | 16.93                 |   17.38              
 |
|             200-1G             | 8.62                   |   8.63              
   |

We can see, the patch show good performance when the size of data blocks is 
small (which has more frequent creation/finalization of data blocks), it is 
helpful in the cases that small files are needed or files need to be created 
frequently (sort of like HBase when WAL, memstore flushing and compaction 
occur.). But it doesn't show much improvement when the size of data block is 
larger.

> Many long-time BLOCKED threads on FsDatasetImpl in a tiered storage test
> ------------------------------------------------------------------------
>
>                 Key: HDFS-9668
>                 URL: https://issues.apache.org/jira/browse/HDFS-9668
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>    java.lang.Thread.State: BLOCKED
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1111)
>       - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>       - None
>       
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>    java.lang.Thread.State: RUNNABLE
>       at java.io.UnixFileSystem.createFileExclusively(Native Method)
>       at java.io.File.createNewFile(File.java:1012)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>       - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>       - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to