[ 
https://issues.apache.org/jira/browse/AMBARI-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Mohanty resolved AMBARI-10606.
------------------------------------
    Resolution: Fixed

> Ambari Agent needs to retry failed install/start operations
> -----------------------------------------------------------
>
>                 Key: AMBARI-10606
>                 URL: https://issues.apache.org/jira/browse/AMBARI-10606
>             Project: Ambari
>          Issue Type: Task
>    Affects Versions: 2.0.0
>            Reporter: Sumit Mohanty
>            Assignee: Sumit Mohanty
>             Fix For: 2.1.0
>
>         Attachments: AMBARI-10606.patch
>
>
> WIth the changes to cluster provisioning in Ambari 2.1, each host is 
> provisioned independently in it's own request. Additionally, users may make 
> provisioning requests prior to hosts becoming available. This means that 
> components that connect to other components in the cluster may start prior to 
> the component that they are attempting to connect to. This connect behavior 
> is outside of Ambari proper and differs significantly between 
> services/components.
> An example of this is HISTORY_SERVER which attempts to connect to NAMENODE 
> and if it fails to connect, it retries a couple of times and fails with a 
> timeout after a small number of seconds.
> As a result, the ambari agent in 2.1 needs to retry failed operations 
> (especially start operations). The retry timeout should be a significant 
> amount of time and could be configurable. This will allow hosts to join the 
> cluster at different times without component connection timeouts causing the 
> request to "fail".
> Currently when a timeout occurs, it doesn't affect other component operations 
> but does result in a "FAILED" response to the user and the user will need to 
> manually start the failed component.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to