[
https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395958#comment-14395958
]
Eric Yang commented on CHUKWA-667:
----------------------------------
Further enhancement to the primary key section,
The first 6 digits are md5 prefix of hashing group name. Then follow by 6
digits of hashing primary key name.
Example of this would be:
Hadoop.dfs.datanode.byteRead:host1.example.com
Where Hadoop.dfs.datanode.byteRead and host1.example.com are collocated primary
keys, but when doing computation aggregating metrics by host,
Hadoop.dfs.datanode.byteRead is used to compute the aggregate.
Hadoop.dfs.datanode.byteRead has somewhat significant value in the computation.
Hence, we generate the primary key as:
Hadoop.dfs.datanode.byteRead = 21da46
host1.example.com = a026db
For day 269 of the year, the row key would appear as:
26921da46a026db
This enable programmer to customize rowFilter to get either the more
significant part of the primary key or the least significant part of the
primary key. thoughts?
> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>
> Key: CHUKWA-667
> URL: https://issues.apache.org/jira/browse/CHUKWA-667
> Project: Chukwa
> Issue Type: Sub-task
> Components: Data Processors
> Affects Versions: 0.6.0
> Reporter: Saisai Shao
>
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to
> Ganglia web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to
> explain the collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range
> (like 30 days) will fetch all the data and draw graph, which will largely
> lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web
> frontend queries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)