[ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042380#comment-14042380
 ] 

Colin Patrick McCabe commented on HDFS-6596:
--------------------------------------------

What you are proposing is basically making every {{read}} into a {{readFully}}. 
 I don't think we want to increase the number of differences between how 
DFSInputStream works and how "normal" Java input streams work.  The "normal" 
java behavior also has a good reason behind it... clients who can deal with 
partial reads will get a faster response time if the stream just returns what 
it can rather than waiting for everything.  In the case of HDFS, waiting for 
everything might mean connecting to a remote DataNode.  This could be quite a 
lot of latency.

bq. From the above code, we can conclude that the read will return at most 
(blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the caller 
must call read() second time to complete the request, and must wait second time 
to acquire the DFSInputStream lock(read() is synchronized for DFSInputStream). 
For latency sensitive applications, such as hbase, this will result in latency 
pain point when they under massive race conditions. So here we propose that we 
should loop internally in read() to do best effort read.

The simplest solution here is just to have code like this in HBase:

{code}
synchronized (stream) {
    buf = stream.readFully(XYZ)
}
doStuff(buf);
{code}

Since monitors are re-entrant in Java, no other thread can take the stream lock 
while we are in our synchronized block.

Another solution would be to modify {{DFSInputStream#readFully}} so that it 
holds the lock the whole time.  This is basically the same as the previous 
solution, but done in Hadoop rather than HBase.

> Improve InputStream when read spans two blocks
> ----------------------------------------------
>
>                 Key: HDFS-6596
>                 URL: https://issues.apache.org/jira/browse/HDFS-6596
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
>
> In the current implementation of DFSInputStream, read(buffer, offset, length) 
> is implemented as following:
> {code}
> int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
> if (locatedBlocks.isLastBlockComplete()) {
>   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
> }
> int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
> {code}
> From the above code, we can conclude that the read will return at most 
> (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
> caller must call read() second time to complete the request, and must wait 
> second time to acquire the DFSInputStream lock(read() is synchronized for 
> DFSInputStream). For latency sensitive applications, such as hbase, this will 
> result in latency pain point when they under massive race conditions. So here 
> we propose that we should loop internally in read() to do best effort read.
> In the current implementation of pread(read(position, buffer, offset, 
> lenght)), it does loop internally to do best effort read. So we can refactor 
> to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to