[jira] [Commented] (HADOOP-9184) Some reducers failing to write final output file to s3.

Jeremy Karn (JIRA) Tue, 22 Jan 2013 12:42:15 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559988#comment-13559988
 ]


Jeremy Karn commented on HADOOP-9184:
-------------------------------------

When I run:

ant test-patch -Dpatch.file=../HADOOP-9184-branch-0.20.patch 
-Dforrest.home=$FORREST_HOME -Dfindbugs.home=$FINDBUGS_HOME 
-Djava5.home=$JAVA_5_HOME

It fails with:

     [exec] 
======================================================================
     [exec]     Pre-building trunk to determine trunk number
     [exec]     of release audit, javac, and Findbugs warnings.
     [exec] 
======================================================================
     [exec] 
======================================================================
     [exec] 
     [exec] 
     [exec] /bin/ant -Dversion=PATCH-HADOOP-9184-branch-0.20.patch 
-Djavac.args=-Xlint -Xmaxwarns 1000  -Djava5.home=/home/ubuntu/jdk1.5.0_22 
-Dforrest.home=/home/ubuntu/apache-forrest-0.8 -DHadoopPatchProcess= clean tar 
> /home/ubuntu/tmp/trunkJavacWarnings.txt 2>&1
     [exec] Trunk compilation is broken?

But I can successfully run: 

ant clean tar -Djavac.args="-Xlint -Xmaxwarns 1000" 
-Dforrest.home=$FORREST_HOME -Dfindbugs.home=$FINDBUGS_HOME 
-Djava5.home=$JAVA_5_HOME -DHadoopPatchProcess= clean tar

Any ideas?
                
> Some reducers failing to write final output file to s3.
> -------------------------------------------------------
>
>                 Key: HADOOP-9184
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9184
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Jeremy Karn
>         Attachments: example.pig, HADOOP-9184-branch-0.20.patch, 
> hadoop-9184.patch, task_log.txt
>
>
> We had a Hadoop job that was running 100 reducers with most of the reducers 
> expected to write out an empty file. When the final output was to an S3 
> bucket we were finding that sometimes we were missing a final part file.  
> This was happening approximately 1 job in 3 (so approximately 1 reducer out 
> of 300 was failing to output the data properly). I've attached the pig script 
> we were using to reproduce the bug.
> After an in depth look and instrumenting the code we traced the problem to 
> moveTaskOutputs in FileOutputCommitter.  
> The code there looked like:
> {code}
>     if (fs.isFile(taskOutput)) {
>       … do stuff …       
>     } else if(fs.getFileStatus(taskOutput).isDir()) {
>       … do stuff … 
>     }
> {code}
> And what we saw happening is that for the problem jobs neither path was being 
> exercised.  I've attached the task log of our instrumented code.  In this 
> version we added an else statement and printed out the line "THIS SEEMS LIKE 
> WE SHOULD NEVER GET HERE …".
> The root cause of this seems to be an eventual consistency issue with S3.  
> You can see in the log that the first time moveTaskOutputs is called it finds 
> that the taskOutput is a directory.  It goes into the isDir() branch and 
> successfully retrieves the list of files in that directory from S3 (in this 
> case just one file).  This triggers a recursive call to moveTaskOutputs for 
> the file found in the directory.  But in this pass through moveTaskOutput the 
> temporary output file can't be found resulting in both branches of the above 
> if statement being skipped and the temporary file never being moved to the 
> final output location.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9184) Some reducers failing to write final output file to s3.

Reply via email to