[
https://issues.apache.org/jira/browse/AMBARI-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aravindan Vijayan updated AMBARI-21593:
---------------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Pushed to branch-2.5 and trunk.
> RU: AMS stopped after RU [AMS distributed mode]
> -----------------------------------------------
>
> Key: AMBARI-21593
> URL: https://issues.apache.org/jira/browse/AMBARI-21593
> Project: Ambari
> Issue Type: Bug
> Components: ambari-metrics
> Affects Versions: 2.5.2
> Reporter: Aravindan Vijayan
> Assignee: Aravindan Vijayan
> Priority: Blocker
> Fix For: 2.5.2
>
> Attachments: AMBARI-21593.patch
>
>
> *PROBLEM*
> When 2 metric collectors are started up simultaneously, both of them fail to
> start.
> *BUG*
> There exists a race condition in the Metric Collector HA controller
> initialization which was introduced through AMBARI-20179. When a helix
> controller instance finds that the /ambari-metrics-collector znode exists but
> a child node does not exists, it deletes the entire znode and recreates. If
> another controller instance also initializes simultaneously, a race condition
> can occur wherein each instance will end up cancelling the effort of the
> other.
> *FIX*
> Do not delete and recreate the znode. Wait and retry for a few seconds to
> check if /ambari-metrics-collector was fully initailized.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)