[ 
https://issues.apache.org/jira/browse/HADOOP-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612600#action_12612600
 ] 

Jothi Padmanabhan commented on HADOOP-3514:
-------------------------------------------

Currently, the intermediate files are written in LocalFS which means one CRC 
file for every file created. Consider the spill of map outputs. We would have 
several spill files and their associated index files and their CRC files as 
well. Typical processing would involve seeking the index file, getting the 
corresponding offset and seeking the actual data file to that offset and 
reading the data. And each of these seeks would also involve seeking the 
associated CRC files as well. If we move away from storing CRC files 
separately, we can eliminate those seeks.

This is aimed only for the Intermediate files.

> Reduce seeks during shuffle, by inline crcs
> -------------------------------------------
>
>                 Key: HADOOP-3514
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3514
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Devaraj Das
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.19.0
>
>
> The number of seeks can be reduced by half in the iFile if we move the crc 
> into the iFile rather than having a separate file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to