[
https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777805#action_12777805
]
Eli Collins commented on HADOOP-3205:
-------------------------------------
Hey Todd,
Nice change!
Would be good to add a comment where the checksum array gets created indicating
that the size of the array determines the size of the underlying IO.
In readChunk an invalid size checksum file currently results in a
ChecksumException, is now an EOFException, would be good to revert back to
ChecksumException and add a unit test for this.
For sanity, do the existing tests cover both small, odd-size single block files?
In readChecksum I'd make expected and calculated ints (know they were longs
before your change) since the code deals w 32-bit sums and just truncate
sum.getValue() rather than cast and truncate the checksumInts.get() return
value. Could assert sum.getValue() == 0xffffffffL & sum.getValue() if we're
feeling paranoid.
It would be good to test the LocalFs version (appears LocalFs is untested) but
since the LocalFs and LocalFileSystem diffs are the same let's leave that to a
separate jira.
Nits:
- Idealy the new illegal argument exception in FSInputChecker.set would be an
assert since the given checkSumSize is not configurable or passed in at
runtime, however since we don't enable asserts yet by default perhaps ok to
leave as is.
- The new code in readChecksum would be more readable if it were left pulled
out into a separate function (how verifySum was).
- Was this way before your change but the back-to-back if statements on line
252-253 could be combined triviallly.
Thanks,
Eli
> Read multiple chunks directly from FSInputChecker subclass into user buffers
> ----------------------------------------------------------------------------
>
> Key: HADOOP-3205
> URL: https://issues.apache.org/jira/browse/HADOOP-3205
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Reporter: Raghu Angadi
> Assignee: Todd Lipcon
> Attachments: hadoop-3205.txt, hadoop-3205.txt, hadoop-3205.txt,
> hadoop-3205.txt
>
>
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have
> access to full user buffer. At any time DFS can access only up to 512 bytes
> even though user usually reads with a much larger buffer (often controlled by
> io.file.buffer.size). This requires implementations to double buffer data if
> an implementation wants to read or write larger chunks of data from
> underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two
> separate jiras.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.