[ 
https://issues.apache.org/jira/browse/HADOOP-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619260#action_12619260
 ] 

Raghu Angadi commented on HADOOP-3514:
--------------------------------------

There are some more important differences from "traditional checksumming" even 
if it is used for one checksum per file:

# User must start reading from the start of the file and read till the end.
# Any call to skip() in between will throw the checksum off.
# When read() returns data, it does not mean its checksum is correct. 
# While reading, user should know what the total file length is (mostly can't 
be used for other streams).
# close() on input stream closes the underlying stream but close on output 
stream does not.

These are based on my brief look at ChecksumInputStream. It is still possible 
to use these streams in another place.. I doubt it will in the context of 
typical checksum stream. I would still suggest moving these next to IFile 
unless there another existing context where these could be used.

> Reduce seeks during shuffle, by inline crcs
> -------------------------------------------
>
>                 Key: HADOOP-3514
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3514
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Devaraj Das
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.19.0
>
>         Attachments: hadoop-3514-v1.patch, hadoop-3514-v2.patch, 
> hadoop-3514.patch
>
>
> The number of seeks can be reduced by half in the iFile if we move the crc 
> into the iFile rather than having a separate file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to