[ 
https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated CHUKWA-667:
-----------------------------
    Attachment: CHUKWA-667.patch

Updated Chukwa HBase schema to these format:

Table: CHUKWA_META
RowKey: metricGroup
Column Family: k
Column Name: metric name or primary key
Cell Value: { "sig": "md5 signature", "type": "source|metric|cluster" }

Table: CHUKWA
RowKey: [DAY][METRIC_KEY_MD5][PRIMARY_KEY_MD5]
Column Family: t
Column Name: [timestamp]
Cell Value: [STRING]
Column Family: a
Column Name: [timestamp]
Cell Value: [TAGS]

This provides round robin partitions and good hash lookup of time range values. 
 Please keep in mind that this storage format works for metrics data, not logs.

> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>
>                 Key: CHUKWA-667
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-667
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>    Affects Versions: 0.6.0
>            Reporter: Saisai Shao
>             Fix For: 0.7.0
>
>         Attachments: CHUKWA-667.patch
>
>
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to 
> Ganglia web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to 
> explain the collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range 
> (like 30 days) will fetch all the data and draw graph, which will largely 
> lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web 
> frontend queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to