[
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219072#comment-14219072
]
Lars Hofhansl commented on HDFS-6735:
-------------------------------------
Thanks [~cmccabe]. "infoLock" is better. I'll fix the indentation later. Let me
have a look at tryReadZeroCopy again. I had mapped out all members and which
methods use what, and concluded the synchronized wasn't needed, quite possible
I made a mistake.
Another locking option is not to synchronize on <this> at all, but to have two
locks ("streamLock" and "pLock", or whatever are good names). That way the
intend might be more explicit.
Yet another option would be to disentangle to two apis by subclassing or
delegation (since the issue really is that we have state for two different
modes of operation in the same class), that'd be a bigger change though.
Meanwhile in HBase land:
Tested this with HBase and observed with a sampler that all delays internal to
DFSInputStream are gone, which is nice.
I committed a change to HBase to allow us to (1) have compaction use their own
input streams so they do not interfere with user scans along the same files and
(2) optionally force p-reads for all user scans. See HBASE-12411.
Especially with #2 I see nice speedups for many concurrent scanners essentially
to what my disks can sustain, but a 50% slow downs for a single scanner per
file only - which is obvious as we're not benefiting from prefetching now.
> A minor optimization to avoid pread() be blocked by read() inside the same
> DFSInputStream
> -----------------------------------------------------------------------------------------
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 3.0.0
> Reporter: Liang Xie
> Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in
> read/pread path, and it has became a HBase read latency pain point so far. In
> HDFS-6698, i made a minor patch against the first encourtered lock, around
> getFileLength, in deed, after reading code and testing, it shows still other
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we
> issue all read()/pread() requests in the same DFSInputStream for one HFile.
> (Multi streams solution is another story i had a plan to do, but probably
> will take more time than i expected)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)