[
https://issues.apache.org/jira/browse/AMBARI-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136319#comment-14136319
]
Siddharth Wagle edited comment on AMBARI-5707 at 9/17/14 5:01 PM:
------------------------------------------------------------------
*Revised architecture overview*:
*Problems with current system*:
- Ganglia has limited capabilities for analyzing historic data, new plugins are
not easy to write.
- Horizontal scale out for large clusters.
- No support for adhoc queries.
- Not easy to add metrics support for new services added to the stack.
- It is non trivial to hook up existing time series databases like OpenTSDB to
store raw data forever.
*Solution*:
- Replace Ganglia with bespoke solution based on an embedded HBase to fit all
needs.
- Ability to store fine-grained data for a configurable amount of time.
- Ability to write SQL (via Phoenix) like queries on aggregated metric data
sets and visualize the results.
- Provide pluggable storage API with ability to forward metric data to external
long-term storage.
- Ability to add user defined metrics and visualize them through the Ambari
Views.
*Component description*:
- *Host metrics monitor*:
A lightweight python process running on every managed host and collecting
metrics for the managed processes running on the host in addition to aggregate
metrics for the entire host. The collected metrics will be pushed to a
pre-configured metric collector to be stored for consumption by the Ambari API.
- *Hadoop Metrics Sink*:
Implementation of Hadoop Metrics Sink interface to pushed data to a configured
collector. As a part of the Hadoop Metric Sink implementation, allow a periodic
flush of collected metrics data, the _putMetric()_ should write data into a
Bounded Buffer cache with a fixed size, configurable through the
hadoop-metrics2.properties.
- *Timeline Metrics Collector*:
Metrics collector is daemon that receives data from registered publishers and
provides ability to push the metrics data to an external metric storage like
OpenTSDB or HDFS along with pushing data to a local metrics store.
Additionally, the metrics collector provides ability to plugin aggregators for
the collected metric data. The aggregation is performed post-write by
aggregator threads running with a configured time interval and aggregating data
collected within that interval.
- *Timeline Metrics Store*:
A time series database is ideal for storing metrics data. The main advantage is
variable time buckets, for example a row key indicating a metric id followed by
an arbitrary number of key value pairs that fit into the time range identified
by a part of the key. This storage model allows simple time based aggregation
and avoids sparse rows. The deployment modes of HBASE allow for scaling up and
down based on cluster size. Also, the choice of HBASE as default storage allows
storage to scale independently and seamlessly from the Metric Collectors.
Phoenix's SQL - Phoenix provides JDBC APIs instead of the regular HBase client
APIs to create tables, insert data, and query your HBase data.
- *Ambari Metrics Service*:
The API design for the Metrics Service should support GET API using key and
time range similar what exists on the HBASE cluster.
- *Ambari Views*:
Ambari Views on top of Phoenix provide ad-hoc query capability to the user
along with a View to replace Ganglia Web
> Replace Ganglia with high performant and pluggable Metrics System
> -----------------------------------------------------------------
>
> Key: AMBARI-5707
> URL: https://issues.apache.org/jira/browse/AMBARI-5707
> Project: Ambari
> Issue Type: Epic
> Components: ambari-agent, ambari-server
> Affects Versions: 1.6.0
> Reporter: Siddharth Wagle
> Assignee: Siddharth Wagle
> Priority: Critical
> Attachments: MetricsArchLatest.png, Revised archtecture diagram.png
>
>
> *Ambari Metrics System*
> - Ability to collect metrics from Hadoop and other Stack services
> - Ability to retain metrics at a high precision for a configurable time
> period (say 5 days)
> - Ability to automatically purge metrics after retention period
> - At collection time, provide clear integration point for external system
> (such as TSDB)
> - At purge time, provide clear integration point for metrics retention by
> external system
> - Should provide default options for external metrics retention (say “HDFS”)
> - Provide tools / utilities for analyzing metrics in retention system (say
> “Hive schema, Pig scripts, etc” that can be used with the default retention
> store “HDFS”)
> *System Requirements*
> - Must be portable and platform independent
> - Must not conflict with any existing metrics system (such as Ganglia)
> - Must not conflict with existing SNMP infra
> - Must not run as root
> - Must have HA story (no SPOF)
> *Usage*
> - Ability to obtain metrics from Ambari REST API (point in time and temporal)
> - Ability to view metric graphs in Ambari Web (currently, fixed)
> - Ability to configure custom metric graphs in Ambari Web (currently, we have
> metric graphs “fixed” into the UI)
> - Need to improve metric graph “navigation” in Ambari Web (currently, metric
> graphs do not allow navigation at arbitrary timeframes, but only at ganglia
> aggregation intervals)
> - Ability to “view cluster” at point in time (i.e. see all metrics at that
> point)
> - Ability to define metrics (and how + where to obtain) in Stack Definitions
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)