[ 
https://issues.apache.org/jira/browse/HADOOP-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619196#action_12619196
 ] 

Jothi Padmanabhan commented on HADOOP-3514:
-------------------------------------------

It is not actually 'checksum per record', but more of 'checksum per file'. To 
be more precise, it has a checksum for the all bytes written between the 
creation and close of the ChecksumOutputStream, This utility could  be used by 
anyone as long as they need one checksum per file and that file is written 
sequentially (and in one shot) and read sequentially. Agree?

> Reduce seeks during shuffle, by inline crcs
> -------------------------------------------
>
>                 Key: HADOOP-3514
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3514
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Devaraj Das
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.19.0
>
>         Attachments: hadoop-3514-v1.patch, hadoop-3514-v2.patch, 
> hadoop-3514.patch
>
>
> The number of seeks can be reduced by half in the iFile if we move the crc 
> into the iFile rather than having a separate file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to