[
https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Yang updated CHUKWA-667:
-----------------------------
Attachment: CHUKWA-667.patch
Updated Chukwa HBase schema to these format:
Table: CHUKWA_META
RowKey: metricGroup
Column Family: k
Column Name: metric name or primary key
Cell Value: { "sig": "md5 signature", "type": "source|metric|cluster" }
Table: CHUKWA
RowKey: [DAY][METRIC_KEY_MD5][PRIMARY_KEY_MD5]
Column Family: t
Column Name: [timestamp]
Cell Value: [STRING]
Column Family: a
Column Name: [timestamp]
Cell Value: [TAGS]
This provides round robin partitions and good hash lookup of time range values.
Please keep in mind that this storage format works for metrics data, not logs.
> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>
> Key: CHUKWA-667
> URL: https://issues.apache.org/jira/browse/CHUKWA-667
> Project: Chukwa
> Issue Type: New Feature
> Components: Data Processors
> Affects Versions: 0.6.0
> Reporter: Saisai Shao
> Fix For: 0.7.0
>
> Attachments: CHUKWA-667.patch
>
>
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to
> Ganglia web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to
> explain the collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range
> (like 30 days) will fetch all the data and draw graph, which will largely
> lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web
> frontend queries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)