[
https://issues.apache.org/jira/browse/AMBARI-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505970#comment-14505970
]
Hudson commented on AMBARI-10606:
---------------------------------
SUCCESS: Integrated in Ambari-trunk-Commit #2392 (See
[https://builds.apache.org/job/Ambari-trunk-Commit/2392/])
Ambari-10606. Ambari Agent needs to retry failed install/start operations
(smohanty:
http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=c254db4b3592e910599ce25c7add8db2650ccfbb)
* ambari-agent/src/main/python/ambari_agent/ActionQueue.py
* ambari-agent/src/main/python/ambari_agent/CustomServiceOrchestrator.py
* ambari-agent/src/test/python/ambari_agent/TestCustomServiceOrchestrator.py
*
ambari-server/src/main/java/org/apache/ambari/server/agent/ExecutionCommand.java
* ambari-agent/src/test/python/ambari_agent/TestActionQueue.py
*
ambari-server/src/main/java/org/apache/ambari/server/configuration/Configuration.java
*
ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariCustomCommandExecutionHelper.java
*
ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariManagementControllerImpl.java
> Ambari Agent needs to retry failed install/start operations
> -----------------------------------------------------------
>
> Key: AMBARI-10606
> URL: https://issues.apache.org/jira/browse/AMBARI-10606
> Project: Ambari
> Issue Type: Task
> Affects Versions: 2.0.0
> Reporter: Sumit Mohanty
> Assignee: Sumit Mohanty
> Fix For: 2.1.0
>
> Attachments: AMBARI-10606.patch
>
>
> WIth the changes to cluster provisioning in Ambari 2.1, each host is
> provisioned independently in it's own request. Additionally, users may make
> provisioning requests prior to hosts becoming available. This means that
> components that connect to other components in the cluster may start prior to
> the component that they are attempting to connect to. This connect behavior
> is outside of Ambari proper and differs significantly between
> services/components.
> An example of this is HISTORY_SERVER which attempts to connect to NAMENODE
> and if it fails to connect, it retries a couple of times and fails with a
> timeout after a small number of seconds.
> As a result, the ambari agent in 2.1 needs to retry failed operations
> (especially start operations). The retry timeout should be a significant
> amount of time and could be configurable. This will allow hosts to join the
> cluster at different times without component connection timeouts causing the
> request to "fail".
> Currently when a timeout occurs, it doesn't affect other component operations
> but does result in a "FAILED" response to the user and the user will need to
> manually start the failed component.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)