[
https://issues.apache.org/jira/browse/AMBARI-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aravindan Vijayan updated AMBARI-21593:
---------------------------------------
Summary: RU: AMS stopped after RU [AMS distributed mode] (was: cp
/tmp/ambari-metrics-timelineservice-2.5.1.0.0.jar
/usr/lib/ambari-metrics-collector/ambari-metrics-timelineservice-2.5.2.0.191.jar)
> RU: AMS stopped after RU [AMS distributed mode]
> -----------------------------------------------
>
> Key: AMBARI-21593
> URL: https://issues.apache.org/jira/browse/AMBARI-21593
> Project: Ambari
> Issue Type: Bug
> Components: ambari-metrics
> Affects Versions: 2.5.2
> Reporter: Aravindan Vijayan
> Assignee: Aravindan Vijayan
> Priority: Blocker
> Fix For: 2.5.2
>
>
> PROBLEM
> When 2 metric collectors are started up simultaneously, both of them fail to
> start.
> BUG
> There exists a race condition in the Metric Collector HA controller
> initialization which was introduced through AMBARI-20179. When a helix
> controller instance finds that the /ambari-metrics-collector znode exists but
> a child node does not exists, it deletes the entire znode and recreates. If
> another controller instance also initializes simultaneously, a race condition
> can occur wherein each instance will end up cancelling the effort of the
> other.
> FIX
> Do not delete and recreate the znode. Wait and retry for a few seconds to
> check if /ambari-metrics-collector was fully initailized.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)