[jira] [Commented] (CHUKWA-667) Optimize the HBase schema for Ganglia queris

Eric Yang (JIRA) Sun, 11 Jan 2015 14:57:06 -0800

    [ 
https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273083#comment-14273083
 ]


Eric Yang commented on CHUKWA-667:
----------------------------------

The above schema is optimized to do a single pass scan with prefix datetime 
that fits in one day.  For historical trending, the scan of such table would 
not be good.  The solution to summarize the raw data into a monthly and yearly 
table is to run map reduce daily to populate hourly and daily averages into 
secondary table.  Hence, the Row Key "day" prefix maybe adjusted to "01-12" for 
monthly table, and "1-9" for yearly table.  Base on the time range selected for 
query, the query API can switch between monthly and yearly table.  This would 
have similar effect as rrdtools which the data in the further distant past is 
lower resolution aggregates, but using HBase provided timestamp versions and 
compaction to achieve data retention.

We may want to do some sort of algorithm hashing for primary_key to ensure we 
have good fix length for primary key and even distribution.

> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>
>                 Key: CHUKWA-667
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-667
>             Project: Chukwa
>          Issue Type: Sub-task
>          Components: Data Processors
>    Affects Versions: 0.6.0
>            Reporter: Saisai Shao
>
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to 
> Ganglia web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to 
> explain the collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range 
> (like 30 days) will fetch all the data and draw graph, which will largely 
> lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web 
> frontend queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CHUKWA-667) Optimize the HBase schema for Ganglia queris

Reply via email to