Sumit Mohanty created AMBARI-10606:
--------------------------------------

             Summary: Ambari Agent needs to retry failed install/start 
operations
                 Key: AMBARI-10606
                 URL: https://issues.apache.org/jira/browse/AMBARI-10606
             Project: Ambari
          Issue Type: Task
    Affects Versions: 2.0.0
            Reporter: Sumit Mohanty
            Assignee: Sumit Mohanty
             Fix For: 2.1.0


WIth the changes to cluster provisioning in Ambari 2.1, each host is 
provisioned independently in it's own request. Additionally, users may make 
provisioning requests prior to hosts becoming available. This means that 
components that connect to other components in the cluster may start prior to 
the component that they are attempting to connect to. This connect behavior is 
outside of Ambari proper and differs significantly between services/components.
An example of this is HISTORY_SERVER which attempts to connect to NAMENODE and 
if it fails to connect, it retries a couple of times and fails with a timeout 
after a small number of seconds.
As a result, the ambari agent in 2.1 needs to retry failed operations 
(especially start operations). The retry timeout should be a significant amount 
of time and could be configurable. This will allow hosts to join the cluster at 
different times without component connection timeouts causing the request to 
"fail".
Currently when a timeout occurs, it doesn't affect other component operations 
but does result in a "FAILED" response to the user and the user will need to 
manually start the failed component.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to