[ http://issues.apache.org/jira/browse/HADOOP-190?page=all ]
[EMAIL PROTECTED] updated HADOOP-190:
-------------------------------------
Attachment: nocleanifdone.patch
Here's a suggested patch. If task has been marked 'done', don't remove the
output (I haven't tested this patch -- the condition is awkward to manufacture).
> Job fails though task succeeded if we fail to exit
> --------------------------------------------------
>
> Key: HADOOP-190
> URL: http://issues.apache.org/jira/browse/HADOOP-190
> Project: Hadoop
> Type: Bug
> Reporter: [EMAIL PROTECTED]
> Attachments: nocleanifdone.patch
>
> This is an odd case. Main cause will be programmer error but I suppose it
> could happen during normal processing. Whichever, would be grand if hadoop
> was better able to deal.
> My map task completed 'successfully' but because I had started threads inside
> in my task that were not set to be of daemon type that under certain
> circumstances were left running, my child stuck around after reporting
> 'done' -- the JVM wouldn't go down while non-daemon threads still running.
> After ten minutes, TT steps in, kills the child and does cleanup of the
> successful output. Because JT has been told the task completed successfully,
> reducers keep showing up looking for the output now removed -- until the job
> fails.
> Below is illustration of the problem using log output:
> ....
> 060501 090401 task_0001_m_000798_0 0.99491096% adding
> http://www.score.umd.edu/a
> um.jpg 24891 image/jpeg
> 060501 090401 task_0001_m_000798_0 1.0% adding
> http://www.score.umd.edu/album.jp
> 24891 image/jpeg
> 060501 090401 Task task_0001_m_000798_0 is done.
> ...
> 060501 091410 task_0001_m_000798_0: Task failed to report status for 608
> seconds
> Killing.
> ....
> 060501 091410 Calling cleanup because was killed or FAILED
> task_0001_m_000798_0
> 060501 091410 task_0001_m_000798_0 done; removing files.
> Then, subsequently....
> 060501 091422 SEVERE Can't open map
> output:/1/hadoop/tmp/task_0001_m_000798_0/pa
> -12.out
> java.io.FileNotFoundException: LocalFS
> ...
> and on and on.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira