Hi Bhuvnesh, You are correct. The Blueprints deployment mechanism in Ambari no longer relies on Role-command ordering to install or start components across the cluster.
This change to Blueprints was actually implemented in Ambari 2.1.0, so it has been around for several releases now. The new approach was implemented to improve the performance times of cluster deployments, and provide better support for dynamic scaling of clusters. That being said, the new deployment mechanism does indeed remove the guarantee of ordering, which can potentially cause some problems for certain types of clusters. There were also changes implemented on the Ambari Agent side to mitigate this problem or ordering. The ambari-agent will now retry INSTALL and START operations if those operations happen to fail. The START operation is probably the most relevant in your case, and is also the operation that does show the ordering issues you’ve mentioned in some deployments. The idea is that the ambari-agent retries should help to resolve any issues with services starting in an unexpected order. This ambari-agent feature is on by default, but can be configured in a more fine-grained fashion by setting some properties in “cluster-env” in your Blueprint or Cluster Creation Template. Unfortunately, this is not documented very well, but the three properties in question are set by default in the BlueprintConfigurationProcessor in the following method: org.apache.ambari.server.controller.internal.BlueprintConfigurationProcessor#setRetryConfiguration The properties set in this method allow control over the types of operations that are retried, the max number of retries attempted, and the maximum amount of time that the agent should attempt a retry. We’ve seen many clusters using this new approach, and have not run into that many problems with respect to ordering. One possible problem we’ve seen is in a small number of components that launch services as a background command. In that case, the ambari-agent cannot detect that a retry is required, and so cannot attempt a restart of a failed service. This problem can usually be resolved with component-specific retries. I don’t know much about the HAWQ component, but I would expect that customizing the retry settings may help this problem. Do the HAWQ components implement retry attempts when booting up? Hope this helps. Thanks, Bob On Mar 11, 2016, at 7:18 PM, Alejandro Fernandez <[email protected]> wrote: > +others who have more insight into BluePrints > > On 3/11/16, 3:24 PM, "Bhuvnesh Chaudhary" <[email protected]> wrote: > >> Hello Sebastian, Alejandro, Andrew, >> >> Referring to the discussion on RB: https://reviews.apache.org/r/43948 >> <https://reviews.apache.org/r/43948/#review120537>, it appears that while >> deploying clusters using Blueprints, RCO is not honored. Please confirm if >> this understanding is correct. >> >> While running internal test suites for HAWQ, we deploy the clusters using >> BP, and we need a specific order in which the HAWQ components must be >> initialized / started. >> >> "HAWQ Standby" component should be initialized after "HAWQ Master" >> component as it has to copy the contents from HAWQ Master. However, since >> RCO is not honored, we often come across issues as HAWQ Standby start / >> initialization before HAWQ Master. >> >> Could you please let us know if there any work already going on for >> bringing in RCO dependency for Blueprints, if not is there any other >> alternative which can be used to enforce the dependency locally, or >> something else which you suggest. >> >> Thanks, >> Bhuvnesh Chaudhary >> Email: bchau <[email protected]>[email protected] >> Desk: +1-650-846-1696 | Mobile: +1-973-906-6976 >
