[
https://issues.apache.org/jira/browse/HADOOP-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610039#action_12610039
]
Doug Cutting commented on HADOOP-3514:
--------------------------------------
I think a single checksum per map task output would be fine. We should attempt
to compute it as high in the i/o chain as possible, ideally in the top-level
OutputStream, before any buffering is done. Similarly, on the reduce side, we
should validate the checksum as late as possible, as the map output is merged.
Checksum errors are relatively infrequent, yet it is vital to catch them,
regardless of whether they happen on disk, in memory or on the network.
> Reduce seeks during shuffle, by inline crcs
> -------------------------------------------
>
> Key: HADOOP-3514
> URL: https://issues.apache.org/jira/browse/HADOOP-3514
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.18.0
> Reporter: Devaraj Das
> Assignee: Jothi Padmanabhan
> Fix For: 0.19.0
>
>
> The number of seeks can be reduced by half in the iFile if we move the crc
> into the iFile rather than having a separate file.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.