[ https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042986#comment-14042986 ]
Zesheng Wu commented on HDFS-6596: ---------------------------------- Thanks Colin. bq. What you are proposing is basically making every {{read}} into a {{readFully}}. I don't think we want to increase the number of differences between how DFSInputStream works and how "normal" Java input streams work. The "normal" java behavior also has a good reason behind it... clients who can deal with partial reads will get a faster response time if the stream just returns what it can rather than waiting for everything. In the case of HDFS, waiting for everything might mean connecting to a remote DataNode. This could be quite a lot of latency. I agree with you that we shouldn't make every {{read}} into a {{readFully}}, and the current implementation of {{read}} has its advantage as you described. About the solution, I think that we do it in Hadoop will be better, because all users will be benefited. The current {{readFully}} for DFSInputStream is implemented as pread and inherits from FSInputStream, so I will a new {{readFully(buffer, offset, length)}} to figure this out. Any thoughts? > Improve InputStream when read spans two blocks > ---------------------------------------------- > > Key: HDFS-6596 > URL: https://issues.apache.org/jira/browse/HDFS-6596 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 2.4.0 > Reporter: Zesheng Wu > Assignee: Zesheng Wu > > In the current implementation of DFSInputStream, read(buffer, offset, length) > is implemented as following: > {code} > int realLen = (int) Math.min(len, (blockEnd - pos + 1L)); > if (locatedBlocks.isLastBlockComplete()) { > realLen = (int) Math.min(realLen, locatedBlocks.getFileLength()); > } > int result = readBuffer(strategy, off, realLen, corruptedBlockMap); > {code} > From the above code, we can conclude that the read will return at most > (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the > caller must call read() second time to complete the request, and must wait > second time to acquire the DFSInputStream lock(read() is synchronized for > DFSInputStream). For latency sensitive applications, such as hbase, this will > result in latency pain point when they under massive race conditions. So here > we propose that we should loop internally in read() to do best effort read. > In the current implementation of pread(read(position, buffer, offset, > lenght)), it does loop internally to do best effort read. So we can refactor > to support this on normal read. -- This message was sent by Atlassian JIRA (v6.2#6252)