[jira] Updated: (HADOOP-3205) Read multiple chunks directly from FSInputChecker subclass into user buffers

Todd Lipcon (JIRA) Thu, 03 Dec 2009 13:46:47 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Todd Lipcon updated HADOOP-3205:
--------------------------------

    Attachment: hadoop-3205.txt

New version of the patch. This addresses Eli's review comments, and adds some 
extra tests (one for truncated checksum file throwing ChecksumException, 
another for odd sized read buffers in a file with a few chunks). I also tidied 
up some of the comments to make it clearer to implementors what's going on.

Just to be doubly sure, I reran all the benchmarks overnight and confirmed that 
reading 32 chunks at once had all the performance improvement benefits of a 
larger value (and uses less memory). Also reran HDFS-755 tests against this 
build with assertions on and everything looked good (plenty of assertion 
failures, but none in the new code!)

> Read multiple chunks directly from FSInputChecker subclass into user buffers
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3205
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3205
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Raghu Angadi
>            Assignee: Todd Lipcon
>         Attachments: hadoop-3205.txt, hadoop-3205.txt, hadoop-3205.txt, 
> hadoop-3205.txt, hadoop-3205.txt
>
>
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have 
> access to full user buffer. At any time DFS can access only up to 512 bytes 
> even though user usually reads with a much larger buffer (often controlled by 
> io.file.buffer.size). This requires implementations to double buffer data if 
> an implementation wants to read or write larger chunks of data from 
> underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two 
> separate jiras.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3205) Read multiple chunks directly from FSInputChecker subclass into user buffers

Reply via email to