[jira] [Commented] (MAPREDUCE-7084) MRAppmaster still running after hadoop job -kill

Jason Lowe (JIRA) Thu, 19 Apr 2018 09:30:12 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444347#comment-16444347
 ]


Jason Lowe commented on MAPREDUCE-7084:
---------------------------------------

Looks like the AM was not able to successfully unregister with the RM, but 
there's no indication as to why it would be having issues (no exception 
logged).  There's also no indication in the RM log as to why it was not able to 
process the unregistration.  A few followup questions:

Is this reproducible?

Is the RM log filtered for just this application or the entire log?  If it's 
filtered, I'm curious if there were any exceptions logged that did not contain 
the app ID during this time period.

Did the application transition out of the RUNNING state after the 10 minute 
expiration?  I see it lingered for a little over 10 minutes in the FINISHING 
state according to the RM log, which correlates with the AM's log indicating it 
is having difficulty unregistering.
{noformat}
018-04-18 10:10:39,350 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1521543893962_10055214 State change from RUNNING to KILLING
2018-04-18 10:10:39,389 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Updating application attempt appattempt_1521543893962_10055214_000001 with 
final state: FINISHING
2018-04-18 10:10:39,391 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1521543893962_10055214_000001 State change from RUNNING to 
FINAL_SAVING
2018-04-18 10:10:39,471 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1521543893962_10055214_000001 State change from FINAL_SAVING to 
FINISHING
2018-04-18 10:21:55,073 INFO 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
Expired:appattempt_1521543893962_10055214_000001 Timed out after 600 secs
2018-04-18 10:21:55,073 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Unregistering app attempt : appattempt_1521543893962_10055214_000001
2018-04-18 10:21:55,073 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1521543893962_10055214_000001 State change from FINISHING to FINISHED
2018-04-18 10:21:55,073 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning 
master appattempt_1521543893962_10055214_000001
2018-04-18 10:21:56,095 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Application appattempt_1521543893962_10055214_000001 is done. 
finalState=FINISHED
2018-04-18 10:21:56,095 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1521543893962_10055214_01_000004 Container Transitioned from ACQUIRED 
to KILLED
{noformat}

Is this really against Apache Hadoop 2.4.0?  If so that is a very old release, 
and I would highly recommend upgrading to at least 2.7 to see if it occurs 
there.


> MRAppmaster still running after hadoop job -kill 
> -------------------------------------------------
>
>                 Key: MAPREDUCE-7084
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7084
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.4.0
>            Reporter: stefanlee
>            Priority: Major
>         Attachments: RM.log
>
>
> My scenario as follows:
>  1. I kill a application by *hadoop job -kill*.
>  2. the *FinalStatus* is *KILLED*, but its *State* is *RUNNING*
>  3. the MRAppmaster process has quit in NodeManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-7084) MRAppmaster still running after hadoop job -kill

Reply via email to