[jira] [Commented] (MAPREDUCE-4933) MR1 final merge asks for length of file it just wrote before flushing it

Sandy Ryza (JIRA) Thu, 10 Jan 2013 14:40:13 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550524#comment-13550524
 ]


Sandy Ryza commented on MAPREDUCE-4933:
---------------------------------------

If I understand how things work, yes.  The length is used to calculate 
onDiskBytes.  If onDiskBytes is 0, (which can happen falsely without the 
patch), the following code won't get called:
{code}
        final int numInMemSegments = memDiskSegments.size();
        diskSegments.addAll(0, memDiskSegments);
        memDiskSegments.clear();
        RawKeyValueIterator diskMerge = Merger.merge(
            job, fs, keyClass, valueClass, codec, diskSegments,
            ioSortFactor, numInMemSegments, tmpDir, comparator,
            reporter, false, spilledRecordsCounter, null);
        diskSegments.clear();
        if (0 == finalSegments.size()) {
          return diskMerge;
        }
        finalSegments.add(new Segment<K,V>(
              new RawKVIteratorReader(diskMerge, onDiskBytes), true));
{code}

which if I understand correctly means that some on-disk data won't be 
incorporated in the final merge.
                
> MR1 final merge asks for length of file it just wrote before flushing it
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4933
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4933
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1, task
>    Affects Versions: 1.1.1
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: MAPREDUCE-4933-branch-1.patch
>
>
> createKVIterator in ReduceTask contains the following code:
> {code}
>           try {
>             Merger.writeFile(rIter, writer, reporter, job);
>             addToMapOutputFilesOnDisk(fs.getFileStatus(outputPath));
>           } catch (Exception e) {
>             if (null != outputPath) {
>               fs.delete(outputPath, true);
>             }
>             throw new IOException("Final merge failed", e);
>           } finally {
>             if (null != writer) {
>               writer.close();
>             }
>           }
> {code}
> Merger#writeFile() does not close the file after writing it, so when 
> fs.getFileStatus() is called on it, it may not return the correct length.  
> This causes bad accounting further down the line, which can lead to map 
> output data being lost.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4933) MR1 final merge asks for length of file it just wrote before flushing it

Reply via email to