[
https://issues.apache.org/jira/browse/AMBARI-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194733#comment-15194733
]
bhuvnesh chaudhary commented on AMBARI-15417:
---------------------------------------------
Pasting email discussion for reference
{code}
---------- Forwarded message ----------
From: Robert Nettleton <[email protected]>
Date: Mon, Mar 14, 2016 at 6:38 AM
Subject: Re: Blueprints - RCO - Related question.
To: "[email protected]" <[email protected]>
Cc: Sumit Mohanty <[email protected]>, Alejandro Fernandez
<[email protected]>
Hi Bhuvnesh,
You are correct. The Blueprints deployment mechanism in Ambari no longer
relies on Role-command ordering to install or start components across the
cluster.
This change to Blueprints was actually implemented in Ambari 2.1.0, so it has
been around for several releases now. The new approach was implemented to
improve the performance times of cluster deployments, and provide better
support for dynamic scaling of clusters.
That being said, the new deployment mechanism does indeed remove the guarantee
of ordering, which can potentially cause some problems for certain types of
clusters. There were also changes implemented on the Ambari Agent side to
mitigate this problem or ordering. The ambari-agent will now retry INSTALL and
START operations if those operations happen to fail. The START operation is
probably the most relevant in your case, and is also the operation that does
show the ordering issues you’ve mentioned in some deployments.
The idea is that the ambari-agent retries should help to resolve any issues
with services starting in an unexpected order.
This ambari-agent feature is on by default, but can be configured in a more
fine-grained fashion by setting some properties in “cluster-env” in your
Blueprint or Cluster Creation Template.
Unfortunately, this is not documented very well, but the three properties in
question are set by default in the BlueprintConfigurationProcessor in the
following method:
org.apache.ambari.server.controller.internal.BlueprintConfigurationProcessor#setRetryConfiguration
The properties set in this method allow control over the types of operations
that are retried, the max number of retries attempted, and the maximum amount
of time that the agent should attempt a retry.
We’ve seen many clusters using this new approach, and have not run into that
many problems with respect to ordering.
One possible problem we’ve seen is in a small number of components that launch
services as a background command. In that case, the ambari-agent cannot detect
that a retry is required, and so cannot attempt a restart of a failed service.
This problem can usually be resolved with component-specific retries.
I don’t know much about the HAWQ component, but I would expect that customizing
the retry settings may help this problem. Do the HAWQ components implement
retry attempts when booting up?
Hope this helps.
Thanks,
Bob
On Mar 11, 2016, at 7:18 PM, Alejandro Fernandez <[email protected]>
wrote:
> +others who have more insight into BluePrints
>
> On 3/11/16, 3:24 PM, "Bhuvnesh Chaudhary" <[email protected]> wrote:
>
>> Hello Sebastian, Alejandro, Andrew,
>>
>> Referring to the discussion on RB: https://reviews.apache.org/r/43948
>> <https://reviews.apache.org/r/43948/#review120537>, it appears that while
>> deploying clusters using Blueprints, RCO is not honored. Please confirm if
>> this understanding is correct.
>>
>> While running internal test suites for HAWQ, we deploy the clusters using
>> BP, and we need a specific order in which the HAWQ components must be
>> initialized / started.
>>
>> "HAWQ Standby" component should be initialized after "HAWQ Master"
>> component as it has to copy the contents from HAWQ Master. However, since
>> RCO is not honored, we often come across issues as HAWQ Standby start /
>> initialization before HAWQ Master.
>>
>> Could you please let us know if there any work already going on for
>> bringing in RCO dependency for Blueprints, if not is there any other
>> alternative which can be used to enforce the dependency locally, or
>> something else which you suggest.
{code}
> Blueprint should have a flag to allow configuring use of RCO vs Retry method
> ----------------------------------------------------------------------------
>
> Key: AMBARI-15417
> URL: https://issues.apache.org/jira/browse/AMBARI-15417
> Project: Ambari
> Issue Type: Bug
> Components: blueprints
> Affects Versions: trunk
> Reporter: bhuvnesh chaudhary
>
> With Blueprint deploy's, role command oder (RCO) is not honored.
> Currently, in order to mitigate failure for a service start due to
> dependencies on other services, blueprint deploy uses retry mechanism to
> ensure that the services are started and their prerequisite are met.
> However, retry mechanism in some cases can cause the install / start time to
> take long and might need additional logic on component specific installation
> to handle retries.
> In order to provide with flexibility, we should put up a flag in blueprints
> which drive the required behavior. (Use RCO vs Use Retry)
> Say: The flag name is use_rco (Change what seems better))
> By default, the value of use_rco can be false and if someone wan't to
> override it they can specify it as true in the blueprint.
> Note: Keeping it as false by default as it has been already there since
> Ambari 2.1.0. Hopefully, even if we set this to true by default, it should
> not impact customers except a few. But we can make this decision based on
> communities opinion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)