[
https://issues.apache.org/jira/browse/HDFS-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797796#action_12797796
]
Todd Lipcon commented on HDFS-877:
----------------------------------
This may turn out to be reasonably tricky to solve. The issue is that the
packet with lastPacketInBlock=true comes in an empty packet after the data has
been read. Consider the following scenario:
# Block is exactly N bytes
# Client determines (or knows) the file length and thus reads exactly up to
byte N, but not past. This is the case for MapReduce jobs when an inputsplit
doesn't cross block boundaries (eg any input file <1block)
# In this case, the server will still send the empty "lastPacketInBlock"
packet, but the client will never read it (since it doesn't read ahead in any
way)
Point 2 above is currently being enforced by DFSInputStream, since it calls
getFileLength() before passing a read() call down into the BlockReader.
A couple things to investigate:
# Is the check currently done by DFSInputStream important for limiting the
length visible to a reader for an in-progress block? Or can that limit be
satisfied by passing only the visible length to the OP_READ_BLOCK call? If the
length limitation can be ignored in the DFSInputStream layer, I think that
would solve the issue fairly trivially.
# Alternatively, can we invert BlockReader.readChunk so that it reads ahead a
packet? That is to say, if after a read, the internal buffer is emptied, can we
read the *next* packet at this point? I don't really like this solution...
> Client-driven checksum verification not functioning
> ---------------------------------------------------
>
> Key: HDFS-877
> URL: https://issues.apache.org/jira/browse/HDFS-877
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 0.20.1, 0.21.0, 0.22.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> This is actually the reason for HDFS-734 (TestDatanodeBlockScanner timing
> out). The issue is that DFSInputStream relies on readChunk being called one
> last time at the end of the file in order to receive the
> lastPacketInBlock=true packet from the DN. However, DFSInputStream.read
> checks pos < getFileLength() before issuing the read. Thus gotEOS never
> shifts to true and checksumOk() is never called.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.