[
https://issues.apache.org/jira/browse/HDFS-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848753#comment-13848753
]
Colin Patrick McCabe commented on HDFS-5664:
--------------------------------------------
bq. Thanks Colin. Then we should remove all synchronization if single threaded
only?
That's a reasonable reply. But I suspect that in practice, removing
"synchronized" from all those methods would break a lot of code that currently
works. The overhead of locking also tends to be low on modern CPUs when the
lock is not contended, so I don't think that we'd save that much. It would be
interesting to benchmark, though.
I kind of wish that we were able to do multiple preads in parallel, but I
suspect that the amount of refactoring you would need to get to that state
would be massive... right now there is an assumption that everything in the
stream is done under a big lock.
bq. Could we save on NN trips if we had added a 'clone' of DFSIS where'd create
a new one passing in an existing one; the new DFSIS would use the block info
the original had already obtained which would be enough to get the new DFSIS
off the ground w/o a trip to the NN?
I haven't thought about it too much, but that seems like an interesting idea.
Probably a good direction to go in. We could definitely copy the block
location information from one stream into another new stream. You would not be
able to reuse the TCP socket, though, if it were a remote read. But that would
still save you a trip to the NameNode.
> try to relieve the BlockReaderLocal read() synchronized hotspot
> ---------------------------------------------------------------
>
> Key: HDFS-5664
> URL: https://issues.apache.org/jira/browse/HDFS-5664
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Liang Xie
> Assignee: Liang Xie
>
> Current the BlockReaderLocal's read has a synchronized modifier:
> {code}
> public synchronized int read(byte[] buf, int off, int len) throws IOException
> {
> {code}
> In a HBase physical read heavy cluster, we observed some hotspots from
> dfsclient path, the detail strace trace could be found from:
> https://issues.apache.org/jira/browse/HDFS-1605?focusedCommentId=13843241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13843241
> I haven't looked into the detail yet, put some raw ideas here firstly:
> 1) replace synchronized with try lock with timeout pattern, so could
> fail-fast, 2) fallback to non-ssr mode if get a local reader lock failed.
> There're two suitable scenario at least to remove this hotspot:
> 1) Local physical read heavy, e.g. HBase block cache miss ratio is high
> 2) slow/bad disk.
> It would be helpful to achive a lower 99th percentile HBase read latency
> somehow.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)