[
https://issues.apache.org/jira/browse/HADOOP-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619260#action_12619260
]
Raghu Angadi commented on HADOOP-3514:
--------------------------------------
There are some more important differences from "traditional checksumming" even
if it is used for one checksum per file:
# User must start reading from the start of the file and read till the end.
# Any call to skip() in between will throw the checksum off.
# When read() returns data, it does not mean its checksum is correct.
# While reading, user should know what the total file length is (mostly can't
be used for other streams).
# close() on input stream closes the underlying stream but close on output
stream does not.
These are based on my brief look at ChecksumInputStream. It is still possible
to use these streams in another place.. I doubt it will in the context of
typical checksum stream. I would still suggest moving these next to IFile
unless there another existing context where these could be used.
> Reduce seeks during shuffle, by inline crcs
> -------------------------------------------
>
> Key: HADOOP-3514
> URL: https://issues.apache.org/jira/browse/HADOOP-3514
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.18.0
> Reporter: Devaraj Das
> Assignee: Jothi Padmanabhan
> Fix For: 0.19.0
>
> Attachments: hadoop-3514-v1.patch, hadoop-3514-v2.patch,
> hadoop-3514.patch
>
>
> The number of seeks can be reduced by half in the iFile if we move the crc
> into the iFile rather than having a separate file.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.