[ 
https://issues.apache.org/jira/browse/AMBARI-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505970#comment-14505970
 ] 

Hudson commented on AMBARI-10606:
---------------------------------

SUCCESS: Integrated in Ambari-trunk-Commit #2392 (See 
[https://builds.apache.org/job/Ambari-trunk-Commit/2392/])
Ambari-10606. Ambari Agent needs to retry failed install/start operations 
(smohanty: 
http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=c254db4b3592e910599ce25c7add8db2650ccfbb)
* ambari-agent/src/main/python/ambari_agent/ActionQueue.py
* ambari-agent/src/main/python/ambari_agent/CustomServiceOrchestrator.py
* ambari-agent/src/test/python/ambari_agent/TestCustomServiceOrchestrator.py
* 
ambari-server/src/main/java/org/apache/ambari/server/agent/ExecutionCommand.java
* ambari-agent/src/test/python/ambari_agent/TestActionQueue.py
* 
ambari-server/src/main/java/org/apache/ambari/server/configuration/Configuration.java
* 
ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariCustomCommandExecutionHelper.java
* 
ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariManagementControllerImpl.java


> Ambari Agent needs to retry failed install/start operations
> -----------------------------------------------------------
>
>                 Key: AMBARI-10606
>                 URL: https://issues.apache.org/jira/browse/AMBARI-10606
>             Project: Ambari
>          Issue Type: Task
>    Affects Versions: 2.0.0
>            Reporter: Sumit Mohanty
>            Assignee: Sumit Mohanty
>             Fix For: 2.1.0
>
>         Attachments: AMBARI-10606.patch
>
>
> WIth the changes to cluster provisioning in Ambari 2.1, each host is 
> provisioned independently in it's own request. Additionally, users may make 
> provisioning requests prior to hosts becoming available. This means that 
> components that connect to other components in the cluster may start prior to 
> the component that they are attempting to connect to. This connect behavior 
> is outside of Ambari proper and differs significantly between 
> services/components.
> An example of this is HISTORY_SERVER which attempts to connect to NAMENODE 
> and if it fails to connect, it retries a couple of times and fails with a 
> timeout after a small number of seconds.
> As a result, the ambari agent in 2.1 needs to retry failed operations 
> (especially start operations). The retry timeout should be a significant 
> amount of time and could be configurable. This will allow hosts to join the 
> cluster at different times without component connection timeouts causing the 
> request to "fail".
> Currently when a timeout occurs, it doesn't affect other component operations 
> but does result in a "FAILED" response to the user and the user will need to 
> manually start the failed component.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to