[jira] Commented: (HADOOP-3205) Read multiple chunks directly from FSInputChecker subclass into user buffers

Eli Collins (JIRA) Fri, 13 Nov 2009 18:16:18 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777805#action_12777805
 ]


Eli Collins commented on HADOOP-3205:
-------------------------------------

Hey Todd,

Nice change! 

Would be good to add a comment where the checksum array gets created indicating 
that the size of the  array determines the size of the underlying IO. 

In readChunk an invalid size checksum file currently results in a 
ChecksumException, is now an EOFException, would be good to revert back to 
ChecksumException and add a unit test for this.

For sanity, do the existing tests cover both small, odd-size single block files?

In readChecksum I'd make expected and calculated ints (know they were longs 
before your change) since the code deals w 32-bit sums and just truncate 
sum.getValue() rather than cast and truncate the checksumInts.get() return 
value. Could assert sum.getValue() == 0xffffffffL & sum.getValue() if we're 
feeling paranoid.

It would be good to test the LocalFs version (appears LocalFs is untested) but 
since the LocalFs and LocalFileSystem diffs are the same let's leave that to a 
separate jira.

Nits:
- Idealy the new illegal argument exception in FSInputChecker.set would be an 
assert since the given checkSumSize is not configurable or passed in at 
runtime, however since we don't enable asserts yet by default perhaps ok to 
leave as is.

- The new code in readChecksum would be more readable if it were left pulled 
out into a separate function (how verifySum was).

- Was this way before your change but the back-to-back if statements on line 
252-253 could be combined triviallly.

Thanks,
Eli

> Read multiple chunks directly from FSInputChecker subclass into user buffers
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3205
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3205
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Raghu Angadi
>            Assignee: Todd Lipcon
>         Attachments: hadoop-3205.txt, hadoop-3205.txt, hadoop-3205.txt, 
> hadoop-3205.txt
>
>
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have 
> access to full user buffer. At any time DFS can access only up to 512 bytes 
> even though user usually reads with a much larger buffer (often controlled by 
> io.file.buffer.size). This requires implementations to double buffer data if 
> an implementation wants to read or write larger chunks of data from 
> underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two 
> separate jiras.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3205) Read multiple chunks directly from FSInputChecker subclass into user buffers

Reply via email to