[ 
https://issues.apache.org/jira/browse/HDFS-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044712#comment-13044712
 ] 

Todd Lipcon commented on HDFS-1148:
-----------------------------------

As I was updating HDFS-941 to trunk tonight, I took the opportunity to look 
into the blocking behavior again. While running TestParallelRead (with 
N_ITERATIONS bumped up 10x) I ran:
{code}
$ while true ; do jstack 3378 | grep -A2 BLOCK >> /tmp/blocked ; done
{code}
and then when it was done:
{code}
$ grep 'at ' /tmp/blocked  | sort | uniq -c | sort -nk1
      1         at java.lang.Object.wait(Native Method)
      6         at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockRange(DFSInputStream.java:313)
     27         at 
org.apache.hadoop.hdfs.TestParallelRead$ReadWorker.read(TestParallelRead.java:142)
    137         at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputStream(FSDataset.java:1286)
    183         at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:100)
    220         at 
org.apache.hadoop.hdfs.DFSInputStream.getFileLength(DFSInputStream.java:206)
    251         at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:1267)
{code}

Three of the top four contention points are on the FSDataset monitor lock. The 
client-side DFSInputStream.getFileLength one is surprising, but not related to 
this particular JIRA.

> Convert FSDataset to ReadWriteLock
> ----------------------------------
>
>                 Key: HDFS-1148
>                 URL: https://issues.apache.org/jira/browse/HDFS-1148
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-1148-old.txt, patch-HDFS-1148-rel0.20.2.txt
>
>
> In benchmarking HDFS-941 I noticed that for the random read workload, the 
> FSDataset lock is highly contended. After converting it to a 
> ReentrantReadWriteLock, I saw a ~25% improvement on both latency and 
> ops/second.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to