[ 
https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491375#comment-14491375
 ] 

Sreepathi Prasanna commented on CHUKWA-667:
-------------------------------------------

I like the idea of separating the metadata and the metrics data itself into two 
different tables. this saves lot of space. 

Regarding row key:

day+6 digits of md5(metricgroup+metric)+6 digits of md5(host) 

I kind of agree to this design, but had a question. Does that mean all metrics 
collected per minute on the same day would hit the same row? is it performant? 
Also if you are aggregating the data every 15 mins, wouldn't that cause load on 
the same rows where writes are happening?

> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>
>                 Key: CHUKWA-667
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-667
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>    Affects Versions: 0.6.0
>            Reporter: Saisai Shao
>             Fix For: 0.7.0
>
>         Attachments: CHUKWA-667.patch
>
>
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to 
> Ganglia web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to 
> explain the collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range 
> (like 30 days) will fetch all the data and draw graph, which will largely 
> lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web 
> frontend queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to