[
https://issues.apache.org/jira/browse/HADOOP-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612600#action_12612600
]
Jothi Padmanabhan commented on HADOOP-3514:
-------------------------------------------
Currently, the intermediate files are written in LocalFS which means one CRC
file for every file created. Consider the spill of map outputs. We would have
several spill files and their associated index files and their CRC files as
well. Typical processing would involve seeking the index file, getting the
corresponding offset and seeking the actual data file to that offset and
reading the data. And each of these seeks would also involve seeking the
associated CRC files as well. If we move away from storing CRC files
separately, we can eliminate those seeks.
This is aimed only for the Intermediate files.
> Reduce seeks during shuffle, by inline crcs
> -------------------------------------------
>
> Key: HADOOP-3514
> URL: https://issues.apache.org/jira/browse/HADOOP-3514
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.18.0
> Reporter: Devaraj Das
> Assignee: Jothi Padmanabhan
> Fix For: 0.19.0
>
>
> The number of seeks can be reduced by half in the iFile if we move the crc
> into the iFile rather than having a separate file.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.