Myroslav Papirkovskyi created AMBARI-15691:
----------------------------------------------
Summary: Express Upgrade hangs if ambari agent is restarted in the
middle of EU
Key: AMBARI-15691
URL: https://issues.apache.org/jira/browse/AMBARI-15691
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.2.2
Reporter: Myroslav Papirkovskyi
Assignee: Myroslav Papirkovskyi
Priority: Blocker
Fix For: 2.2.2
Attachments: AMBARI-15691.patch
*Steps*
# Install HDP-2.4.0.0 with Ambari 2.2.2 (secure, non-HA cluster)
# Start EU to 2.4.2.0-127 and reach till "Backup Knox data" prompt
# Hit Proceed at "backup Knox data" message
# Stop ambari agent on two of the cluster hosts and wait for EU to fail with
"HOLDING_TIMEDOUT" status (in my test EU stopped at "Snapshot HBase" task)
# Start the agents on both hosts and wait 90 secs. for agents to heartbeat
# Retry the failed task
*Result*
EU hangs
>From ambari-server log:
{code}
04 Apr 2016 08:20:14,729 WARN [ambari-action-scheduler] ActionScheduler:201 -
Exception received
java.lang.NullPointerException
at
org.apache.ambari.server.actionmanager.ActionScheduler.wasAgentRestartedDuringOperation(ActionScheduler.java:887)
at
org.apache.ambari.server.actionmanager.ActionScheduler.processInProgressStage(ActionScheduler.java:691)
at
org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:289)
at
org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:196)
at java.lang.Thread.run(Thread.java:745)
04 Apr 2016 08:30:29,451 WARN [ambari-action-scheduler] ActionScheduler:695 -
Detected ambari-agent restart during command execution.The command has been
aborted.Execution command details: host:
os-d7-ngzvlu-ambari-se-eu-10-2.novalocal, role: ru_execute_tasks, actionId:
19-27
04 Apr 2016 08:30:30,581 WARN [ambari-action-scheduler] ActionScheduler:695 -
Detected ambari-agent restart during command execution.The command has been
aborted.Execution command details: host:
os-d7-ngzvlu-ambari-se-eu-10-2.novalocal, role: ru_execute_tasks, actionId:
19-27
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)