[
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226791#comment-14226791
]
Colin Patrick McCabe commented on HDFS-6735:
--------------------------------------------
* {{readWithStrategy}} and {{blockSeekTo}} should be marked {{synchronized}}.
Yes, they are called from a {{synchronized}} function, but let's make it clear.
It's kind of confusing to see us fooling around with {{pos}} and other stuff
without seeing a {{synchronized}} on the function.
* We should document in a comment that we cannot try to take the DFSInputStream
lock when holding the infoLock. We need to be careful to avoid deadlock, and
maintaining this lock ordering is the easiest way.
* I noticed that in {{blockSeekTo}}, we are holding the {{infoLock}} when
calling {{BlockReaderFactory#build}}. It would be nice to avoid this. That
function does a lot of stuff... if we're creating a {{RemoteBlockReader2}}, it
potentially blocks while a TCP connection to the DataNode is opened. It seems
like all you need the {{infoLock}} for here is to get the {{cachingStrategy}}
and determine if {{shortCircuitForbidden}}, and you could pull this out into a
synchronized block prior to the {{Builder#build}} call, similar to how
{{actualGetFromOneDataNode}} does it.
Incidentally, the findbugs warning is probably because findbugs doesn't realize
that {{CachingStrategy}} is an immutable class, and so it's safe to access it
without locking. (The only thing you need locking for is actually reading the
current reference to the object, not for accessing the object itself.)
+1 once those are addressed
> A minor optimization to avoid pread() be blocked by read() inside the same
> DFSInputStream
> -----------------------------------------------------------------------------------------
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 3.0.0
> Reporter: Liang Xie
> Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt,
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in
> read/pread path, and it has became a HBase read latency pain point so far. In
> HDFS-6698, i made a minor patch against the first encourtered lock, around
> getFileLength, in deed, after reading code and testing, it shows still other
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we
> issue all read()/pread() requests in the same DFSInputStream for one HFile.
> (Multi streams solution is another story i had a plan to do, but probably
> will take more time than i expected)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)