[
https://issues.apache.org/jira/browse/MAPREDUCE-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Scott Oaks updated MAPREDUCE-7337:
----------------------------------
Summary: Task fails while deleting spill files on slow disk (was: Task
files while deleting spill files on slow disk)
> Task fails while deleting spill files on slow disk
> --------------------------------------------------
>
> Key: MAPREDUCE-7337
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7337
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: performance
> Reporter: Scott Oaks
> Priority: Minor
>
> We sometimes have tasks fail when deleting spill files in this loop (line
> 2005 of MapTask.java):
> {code:java}
> for(int i = 0; i < numSpills; i++) {
> rfs.delete(filename[i],true);
> }{code}
> During this loop, there is no communication back to the master server, and
> hence if the loop takes too long, the master server assumes the child has
> timed out and tells the nodeagent to kill the yarn child.
> Typically this is linked to storage issues, and we've seen it most often due
> to an underlying bug in the filesystem (where there is contention in the
> filesystem delete path when deleting several files). But while there are
> usually underlying issues, it still wouldn't hurt to mark progress in the
> task during this loop periodically.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]