[ https://issues.apache.org/jira/browse/MAPREDUCE-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Scott Oaks updated MAPREDUCE-7337: ---------------------------------- Summary: Task fails while deleting spill files on slow disk (was: Task files while deleting spill files on slow disk) > Task fails while deleting spill files on slow disk > -------------------------------------------------- > > Key: MAPREDUCE-7337 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7337 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: performance > Reporter: Scott Oaks > Priority: Minor > > We sometimes have tasks fail when deleting spill files in this loop (line > 2005 of MapTask.java): > {code:java} > for(int i = 0; i < numSpills; i++) { > rfs.delete(filename[i],true); > }{code} > During this loop, there is no communication back to the master server, and > hence if the loop takes too long, the master server assumes the child has > timed out and tells the nodeagent to kill the yarn child. > Typically this is linked to storage issues, and we've seen it most often due > to an underlying bug in the filesystem (where there is contention in the > filesystem delete path when deleting several files). But while there are > usually underlying issues, it still wouldn't hurt to mark progress in the > task during this loop periodically. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org