Jonathan Hurley created AMBARI-12867:
----------------------------------------
Summary: Do Not Automatically Abort Stack Repository Installation
When A Host Timed Out
Key: AMBARI-12867
URL: https://issues.apache.org/jira/browse/AMBARI-12867
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.1.0
Reporter: Jonathan Hurley
Assignee: Jonathan Hurley
Priority: Critical
Fix For: 2.1.2
On 1000 node RU I had 2.3.0.0-2557 installed with some 20 hosts down with
heartbeat lost. Then I registered 2.3.2.0-2664 and when I proceeded to install,
it would always get aborted with no logs in server or agents.
Turns out that whenever we install, we do so in stages containing 100 hosts
each. If any of the host failed or timed out etc., the rest of the stages are
aborted. So in this case the first stage had 1 host timeout, which resulted in
that and other stages being aborted.
I cannot install a version without all hosts being alive. Workaround seems to
be to delete lost hosts from Ambari.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)