[jira] [Comment Edited] (AMBARI-5707) Replace Ganglia with high performant and pluggable Metrics System

Siddharth Wagle (JIRA) Wed, 17 Sep 2014 10:02:48 -0700

    [ 
https://issues.apache.org/jira/browse/AMBARI-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136319#comment-14136319
 ]


Siddharth Wagle edited comment on AMBARI-5707 at 9/17/14 5:01 PM:
------------------------------------------------------------------

*Revised architecture overview*:

*Problems with current system*:
- Ganglia has limited capabilities for analyzing historic data, new plugins are 
not easy to write.
- Horizontal scale out for large clusters.
- No support for adhoc queries.
- Not easy to add metrics support for new services added to the stack.
- It is non trivial to hook up existing time series databases like OpenTSDB to 
store raw data forever.

*Solution*:
- Replace Ganglia with bespoke solution based on an embedded HBase to fit all 
needs.
- Ability to store fine-grained data for a configurable amount of time.
- Ability to write SQL (via Phoenix) like queries on aggregated metric data 
sets and visualize the results.
- Provide pluggable storage API with ability to forward metric data to external 
long-term storage.
- Ability to add user defined metrics and visualize them through the Ambari 
Views.

*Component description*:

- *Host metrics monitor*:
A lightweight python process running on every managed host and collecting 
metrics for the managed processes running on the host in addition to aggregate 
metrics for the entire host. The collected metrics will be pushed to a 
pre-configured metric collector to be stored for consumption by the Ambari API.

- *Hadoop Metrics Sink*:
Implementation of Hadoop Metrics Sink interface to pushed data to a configured 
collector. As a part of the Hadoop Metric Sink implementation, allow a periodic 
flush of collected metrics data, the _putMetric()_ should write data into a 
Bounded Buffer cache with a fixed size, configurable through the 
hadoop-metrics2.properties.

- *Timeline Metrics Collector*:
Metrics collector is daemon that receives data from registered publishers and 
provides ability to push the metrics data to an external metric storage like 
OpenTSDB or HDFS along with pushing data to a local metrics store. 
Additionally, the metrics collector provides ability to plugin aggregators for 
the collected metric data. The aggregation is performed post-write by 
aggregator threads running with a configured time interval and aggregating data 
collected within that interval.

- *Timeline Metrics Store*:
A time series database is ideal for storing metrics data. The main advantage is 
variable time buckets, for example a row key indicating a metric id followed by 
an arbitrary number of key value pairs that fit into the time range identified 
by a part of the key. This storage model allows simple time based aggregation 
and avoids sparse rows. The deployment modes of HBASE allow for scaling up and 
down based on cluster size. Also, the choice of HBASE as default storage allows 
storage to scale independently and seamlessly from the Metric Collectors. 
Phoenix's SQL - Phoenix provides JDBC APIs instead of the regular HBase client 
APIs to create tables, insert data, and query your HBase data.

- *Ambari Metrics Service*:
The API design for the Metrics Service should support GET API using key and 
time range similar what exists on the HBASE cluster.

- *Ambari Views*:
Ambari Views on top of Phoenix provide ad-hoc query capability to the user 
along with a View to replace Ganglia Web



> Replace Ganglia with high performant and pluggable Metrics System
> -----------------------------------------------------------------
>
>                 Key: AMBARI-5707
>                 URL: https://issues.apache.org/jira/browse/AMBARI-5707
>             Project: Ambari
>          Issue Type: Epic
>          Components: ambari-agent, ambari-server
>    Affects Versions: 1.6.0
>            Reporter: Siddharth Wagle
>            Assignee: Siddharth Wagle
>            Priority: Critical
>         Attachments: MetricsArchLatest.png, Revised archtecture diagram.png
>
>
> *Ambari Metrics System*
> - Ability to collect metrics from Hadoop and other Stack services
> - Ability to retain metrics at a high precision for a configurable time 
> period (say 5 days)
> - Ability to automatically purge metrics after retention period
> - At collection time, provide clear integration point for external system 
> (such as TSDB)
> - At purge time, provide clear integration point for metrics retention by 
> external system
> - Should provide default options for external metrics retention (say “HDFS”)
> - Provide tools / utilities for analyzing metrics in retention system (say 
> “Hive schema, Pig scripts, etc” that can be used with the default retention 
> store “HDFS”)
> *System Requirements*
> - Must be portable and platform independent
> - Must not conflict with any existing metrics system (such as Ganglia)
> - Must not conflict with existing SNMP infra
> - Must not run as root
> - Must have HA story (no SPOF)
> *Usage*
> - Ability to obtain metrics from Ambari REST API (point in time and temporal)
> - Ability to view metric graphs in Ambari Web (currently, fixed)
> - Ability to configure custom metric graphs in Ambari Web (currently, we have 
> metric graphs “fixed” into the UI)
> - Need to improve metric graph “navigation” in Ambari Web (currently, metric 
> graphs do not allow navigation at arbitrary timeframes, but only at ganglia 
> aggregation intervals) 
> - Ability to “view cluster” at point in time (i.e. see all metrics at that 
> point)
> - Ability to define metrics (and how + where to obtain) in Stack Definitions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (AMBARI-5707) Replace Ganglia with high performant and pluggable Metrics System

Reply via email to