[
https://issues.apache.org/jira/browse/AMBARI-22848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jayush Luniya updated AMBARI-22848:
-----------------------------------
Fix Version/s: (was: 2.7.1)
3.0.0
> Blueprint database inconsistency should be caught by Ambari DB consistency
> checker
> ----------------------------------------------------------------------------------
>
> Key: AMBARI-22848
> URL: https://issues.apache.org/jira/browse/AMBARI-22848
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.5.0
> Reporter: Robert Nettleton
> Assignee: Robert Nettleton
> Priority: Critical
> Fix For: 3.0.0
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> We've seen some Blueprint deployments fail after an upgrade to Ambari
> 2.5.2/2.6) causes older configuration to be reset.
> 1. User deploys cluster via Blueprints with older (older than Ambari 2.5/2.6)
> version of Ambari.
> 2. Cluster deployment fails, and either the user doesn't realize the
> deployment has failed, or works through the manual configuration changes
> required to get failed services up and running.
> 3. Things run fine, sometimes for quite a while.
> 4. User upgrades ambari-server to Ambari 2.5 or Ambari 2.6.
> 5. Upon the restart of ambari-server, some services seem to be failing, due
> to invalid, or old configuration.
> The root cause of this problem is that the Blueprints TopologyManager class
> will attempt to "replay" any failed requests, which was originally
> implemented to allow a Blueprints install to continue working even if
> ambari-server is stopped and restarted.
> Since the original Blueprint deployment failed, the Ambari Server database is
> in an inconsistent state, which causes the Blueprints ToplogyManager to
> attempt a replay of various configuration tasks. This ends up causing the
> TopologyManager to send configuration updates from the Blueprints's
> configuration sections, why by now may be quite out of date, as the cluster
> may have changed over time while being adminstered.
> This in turn causes some services to fail, as older configuration may not
> match the current environment.
>
> The ambari-server update mechanism should be modified to include integrity
> checks on the Blueprint-related tables in the database. In particular, if a
> Blueprint deployment is detected, at the very least the "clusterconfig" table
> needs to be checked, to ensure that at least one configuration type's version
> has a
> {code:java}
> version_tag{code}
> of "TOPOLOGY_RESOLVED". If no configuration versions are found to have a tag
> of "TOPOLOGY_RESOLVED", then the ambari-server upgrade should fail with the
> appropriate messages, to allow the user to make the manual changes required
> in order to resolve the problem, usually by applying a workaround.
> Having this check at the ambari-server upgrade time seems like the correct
> way to move forward, as this will more quickly detect this problem, and will
> keep users from accidentally moving forward with an upgrade that will corrupt
> the cluster's configuration with older configuration items.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)