[ https://issues.apache.org/jira/browse/MAPREDUCE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Lowe updated MAPREDUCE-7053: ---------------------------------- Status: Patch Available (was: Open) Yeah, this is yet another latent bug that was exposed when the task attempt listener starts rejecting status updates for tasks the AM no longer thinks is running. As such I'm proposing a fix where we do *not* immediately reject attempts that the AM thinks should not be running, but rather give them a grace period of sorts. This patch adds the ability of the task heartbeat handler to track attempts that have unregistered recently. It uses the same grace period for unregistered tasks that is currently used for tasks that have unregistered via the umbilical and are shutting down gracefully. This keeps the AM from immediately rejecting a recently unregistered attempt, allowing that attempt to receive a stack dump signal and otherwise shut down cleanly by itself. After the grace period expires, it will reject status updates. > Timed out tasks can fail to produce thread dump > ----------------------------------------------- > > Key: MAPREDUCE-7053 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7053 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6 > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Major > Attachments: MAPREDUCE-7053.001.patch > > > TestMRJobs#testThreadDumpOnTaskTimeout has been failing sporadically > recently. When the AM times out a task it immediately removes it from the > list of known tasks and then connects to the NM to request a thread dump > followed by a kill. If the task heartbeats in after the task has been > removed from the list of known tasks but before the thread dump signal > arrives then the task can exit with a "org.apache.hadoop.mapred.Task: Parent > died." message and no thread dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org