[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

Gera Shegalov (JIRA) Sun, 30 Nov 2014 21:32:39 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229432#comment-14229432
 ]


Gera Shegalov commented on MAPREDUCE-6166:
------------------------------------------

Thanks for commenting [~eepayne]!
bq. Since OnDiskMapOutput is shuffling the whole IFile to disk, the checksum is 
needed later during the last merge pass when the IFile contents are read again 
and decompressed.

Can you clarify where in the code it's required to keep the original checksum?

What I see is that after your modifications, {{OnDiskMapOutput}} is guaranteed 
to validate the contents of the destination buffer against the remote checksum. 
Then this contents are written out using {{LocalFileSystem}}, which will create 
again an on-disk checksum because it's based on {{ChecksumFileSystem}}. Are you 
proposing an optimization that the checksum is not computed twice when 
shuffling straight to disk by using {{RawLocalFileSystem}}? Can we defer it to 
another JIRA?

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6166
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.6.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>         Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (50000 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

Reply via email to