[
https://issues.apache.org/jira/browse/AMBARI-20179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883384#comment-15883384
]
Aravindan Vijayan commented on AMBARI-20179:
--------------------------------------------
Relevant unit tests passed.
[~swagle], [~dsen], [~sumitmohanty] Can you folks review this simple change?
> AMS Collector shuts down with Helix-Zk related exception if partial
> /ambari-metrics-cluster znode exists.
> ---------------------------------------------------------------------------------------------------------
>
> Key: AMBARI-20179
> URL: https://issues.apache.org/jira/browse/AMBARI-20179
> Project: Ambari
> Issue Type: Bug
> Components: ambari-metrics
> Affects Versions: 2.5.0
> Reporter: Aravindan Vijayan
> Assignee: Aravindan Vijayan
> Priority: Critical
> Fix For: 2.5.0
>
> Attachments: AMBARI-20179.patch
>
>
> Exception Trace
> {code}
> 2017-02-23 23:13:54,298 ERROR
> org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore:
> org.I0Itec.zkclient.exception.ZkNoNodeException:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for /ambari-metrics-cluster/INSTANCES
> 2017-02-23 23:13:54,299 INFO org.apache.hadoop.service.AbstractService:
> Service
> org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore
> failed in state INITED; cause:
> org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException:
> Unable to initialize HA controller
> org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.MetricsSystemInitializationException:
> Unable to initialize HA controller
> at
> org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:114)
> at
> org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.serviceInit(HBaseTimelineMetricStore.java:93)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:84)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:137)
> at
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:147)
> Caused by: org.I0Itec.zkclient.exception.ZkNoNodeException:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for /ambari-metrics-cluster/INSTANCES
> at
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
> at org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:208)
> at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:672)
> at
> org.apache.helix.manager.zk.ZKHelixAdmin.getInstancesInCluster(ZKHelixAdmin.java:575)
> at
> org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.availability.MetricCollectorHAController.initializeHAController(MetricCollectorHAController.java:126)
> at
> org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.HBaseTimelineMetricStore.initializeSubsystem(HBaseTimelineMetricStore.java:111)
> ... 7 more
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /ambari-metrics-cluster/INSTANCES
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1532)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1560)
> at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:119)
> at org.apache.helix.manager.zk.ZkClient$3.call(ZkClient.java:211)
> at org.apache.helix.manager.zk.ZkClient$3.call(ZkClient.java:208)
> at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)
> ... 12 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)