[
https://issues.apache.org/jira/browse/MAPREDUCE-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444347#comment-16444347
]
Jason Lowe commented on MAPREDUCE-7084:
---------------------------------------
Looks like the AM was not able to successfully unregister with the RM, but
there's no indication as to why it would be having issues (no exception
logged). There's also no indication in the RM log as to why it was not able to
process the unregistration. A few followup questions:
Is this reproducible?
Is the RM log filtered for just this application or the entire log? If it's
filtered, I'm curious if there were any exceptions logged that did not contain
the app ID during this time period.
Did the application transition out of the RUNNING state after the 10 minute
expiration? I see it lingered for a little over 10 minutes in the FINISHING
state according to the RM log, which correlates with the AM's log indicating it
is having difficulty unregistering.
{noformat}
018-04-18 10:10:39,350 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1521543893962_10055214 State change from RUNNING to KILLING
2018-04-18 10:10:39,389 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Updating application attempt appattempt_1521543893962_10055214_000001 with
final state: FINISHING
2018-04-18 10:10:39,391 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1521543893962_10055214_000001 State change from RUNNING to
FINAL_SAVING
2018-04-18 10:10:39,471 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1521543893962_10055214_000001 State change from FINAL_SAVING to
FINISHING
2018-04-18 10:21:55,073 INFO
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor:
Expired:appattempt_1521543893962_10055214_000001 Timed out after 600 secs
2018-04-18 10:21:55,073 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Unregistering app attempt : appattempt_1521543893962_10055214_000001
2018-04-18 10:21:55,073 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1521543893962_10055214_000001 State change from FINISHING to FINISHED
2018-04-18 10:21:55,073 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning
master appattempt_1521543893962_10055214_000001
2018-04-18 10:21:56,095 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Application appattempt_1521543893962_10055214_000001 is done.
finalState=FINISHED
2018-04-18 10:21:56,095 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1521543893962_10055214_01_000004 Container Transitioned from ACQUIRED
to KILLED
{noformat}
Is this really against Apache Hadoop 2.4.0? If so that is a very old release,
and I would highly recommend upgrading to at least 2.7 to see if it occurs
there.
> MRAppmaster still running after hadoop job -kill
> -------------------------------------------------
>
> Key: MAPREDUCE-7084
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7084
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: applicationmaster
> Affects Versions: 2.4.0
> Reporter: stefanlee
> Priority: Major
> Attachments: RM.log
>
>
> My scenario as follows:
> 1. I kill a application by *hadoop job -kill*.
> 2. the *FinalStatus* is *KILLED*, but its *State* is *RUNNING*
> 3. the MRAppmaster process has quit in NodeManager
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]