[
https://issues.apache.org/jira/browse/AMBARI-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sumit Mohanty resolved AMBARI-10606.
------------------------------------
Resolution: Fixed
> Ambari Agent needs to retry failed install/start operations
> -----------------------------------------------------------
>
> Key: AMBARI-10606
> URL: https://issues.apache.org/jira/browse/AMBARI-10606
> Project: Ambari
> Issue Type: Task
> Affects Versions: 2.0.0
> Reporter: Sumit Mohanty
> Assignee: Sumit Mohanty
> Fix For: 2.1.0
>
> Attachments: AMBARI-10606.patch
>
>
> WIth the changes to cluster provisioning in Ambari 2.1, each host is
> provisioned independently in it's own request. Additionally, users may make
> provisioning requests prior to hosts becoming available. This means that
> components that connect to other components in the cluster may start prior to
> the component that they are attempting to connect to. This connect behavior
> is outside of Ambari proper and differs significantly between
> services/components.
> An example of this is HISTORY_SERVER which attempts to connect to NAMENODE
> and if it fails to connect, it retries a couple of times and fails with a
> timeout after a small number of seconds.
> As a result, the ambari agent in 2.1 needs to retry failed operations
> (especially start operations). The retry timeout should be a significant
> amount of time and could be configurable. This will allow hosts to join the
> cluster at different times without component connection timeouts causing the
> request to "fail".
> Currently when a timeout occurs, it doesn't affect other component operations
> but does result in a "FAILED" response to the user and the user will need to
> manually start the failed component.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)