Hi Bhuvnesh,

You are correct.  The Blueprints deployment mechanism in Ambari no longer 
relies on Role-command ordering to install or start components across the 
cluster.

This change to Blueprints was actually implemented in Ambari 2.1.0, so it has 
been around for several releases now.  The new approach was implemented to 
improve the performance times of cluster deployments, and provide better 
support for dynamic scaling of clusters.  

That being said, the new deployment mechanism does indeed remove the guarantee 
of ordering, which can potentially cause some problems for certain types of 
clusters.  There were also changes implemented on the Ambari Agent side to 
mitigate this problem or ordering.  The ambari-agent will now retry INSTALL and 
START operations if those operations happen to fail.  The START operation is 
probably the most relevant in your case, and is also the operation that does 
show the ordering issues you’ve mentioned in some deployments.  

The idea is that the ambari-agent retries should help to resolve any issues 
with services starting in an unexpected order.  

This ambari-agent feature is on by default, but can be configured in a more 
fine-grained fashion by setting some properties in “cluster-env” in your 
Blueprint or Cluster Creation Template. 

Unfortunately, this is not documented very well, but the three properties in 
question are set by default in the BlueprintConfigurationProcessor in the 
following method:

org.apache.ambari.server.controller.internal.BlueprintConfigurationProcessor#setRetryConfiguration

The properties set in this method allow control over the types of operations 
that are retried, the max number of retries attempted, and the maximum amount 
of time that the agent should attempt a retry. 

We’ve seen many clusters using this new approach, and have not run into that 
many problems with respect to ordering.  

One possible problem we’ve seen is in a small number of components that launch 
services as a background command.  In that case, the ambari-agent cannot detect 
that a retry is required, and so cannot attempt a restart of a failed service.  
This problem can usually be resolved with component-specific retries.  

I don’t know much about the HAWQ component, but I would expect that customizing 
the retry settings may help this problem.  Do the HAWQ components implement 
retry attempts when booting up?  

Hope this helps.  

Thanks,
Bob




On Mar 11, 2016, at 7:18 PM, Alejandro Fernandez <[email protected]> 
wrote:

> +others who have more insight into BluePrints
> 
> On 3/11/16, 3:24 PM, "Bhuvnesh Chaudhary" <[email protected]> wrote:
> 
>> Hello Sebastian, Alejandro, Andrew,
>> 
>> Referring to the discussion on RB: https://reviews.apache.org/r/43948
>> <https://reviews.apache.org/r/43948/#review120537>, it appears that while
>> deploying clusters using Blueprints, RCO is not honored. Please confirm if
>> this understanding is correct.
>> 
>> While running internal test suites for HAWQ, we deploy the clusters using
>> BP, and we need a specific order in which the HAWQ components must be
>> initialized / started.
>> 
>> "HAWQ Standby" component should be initialized after "HAWQ Master"
>> component as it has to copy the contents from HAWQ Master. However, since
>> RCO is not honored, we often come across issues as HAWQ Standby start /
>> initialization before HAWQ Master.
>> 
>> Could you please let us know if there any work already going on for
>> bringing in RCO dependency for Blueprints, if not is there any other
>> alternative which can be used to enforce the dependency locally, or
>> something else which you suggest.
>> 
>> Thanks,
>> Bhuvnesh Chaudhary
>> Email: bchau <[email protected]>[email protected]
>> Desk: +1-650-846-1696 | Mobile: +1-973-906-6976
> 

Reply via email to