[
https://issues.apache.org/jira/browse/HDFS-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044712#comment-13044712
]
Todd Lipcon commented on HDFS-1148:
-----------------------------------
As I was updating HDFS-941 to trunk tonight, I took the opportunity to look
into the blocking behavior again. While running TestParallelRead (with
N_ITERATIONS bumped up 10x) I ran:
{code}
$ while true ; do jstack 3378 | grep -A2 BLOCK >> /tmp/blocked ; done
{code}
and then when it was done:
{code}
$ grep 'at ' /tmp/blocked | sort | uniq -c | sort -nk1
1 at java.lang.Object.wait(Native Method)
6 at
org.apache.hadoop.hdfs.DFSInputStream.getBlockRange(DFSInputStream.java:313)
27 at
org.apache.hadoop.hdfs.TestParallelRead$ReadWorker.read(TestParallelRead.java:142)
137 at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputStream(FSDataset.java:1286)
183 at
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:100)
220 at
org.apache.hadoop.hdfs.DFSInputStream.getFileLength(DFSInputStream.java:206)
251 at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:1267)
{code}
Three of the top four contention points are on the FSDataset monitor lock. The
client-side DFSInputStream.getFileLength one is surprising, but not related to
this particular JIRA.
> Convert FSDataset to ReadWriteLock
> ----------------------------------
>
> Key: HDFS-1148
> URL: https://issues.apache.org/jira/browse/HDFS-1148
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hdfs-1148-old.txt, patch-HDFS-1148-rel0.20.2.txt
>
>
> In benchmarking HDFS-941 I noticed that for the random read workload, the
> FSDataset lock is highly contended. After converting it to a
> ReentrantReadWriteLock, I saw a ~25% improvement on both latency and
> ops/second.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira