[
https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274764#comment-14274764
]
Sreepathi Prasanna commented on CHUKWA-667:
-------------------------------------------
Hello Eric,
I have read from HBASE guide, that its not optimized for multiple column
families.
Using the below schema, would result in many column families.
Row Key: 13:host1.example.com
Column Family: HDFS
Column: datanode_bytes_read
Also, all the queries both read and write, might hit the same row at the same
time.
How about adding the metric name at the end or starting of row key, For example,
Row Key: 13|host1.example.com|datanode_bytes_read.
Further we can save some space by assigning numbers to each of metrics. For
example, if datanode_bytes_read is assigned a number lets say 100, then row key
would become,
Row Key : 13|host1.example.com|100
and the mapping between metric name and number might be maintained in a
separate table. This saves some space in HBase.
> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>
> Key: CHUKWA-667
> URL: https://issues.apache.org/jira/browse/CHUKWA-667
> Project: Chukwa
> Issue Type: Sub-task
> Components: Data Processors
> Affects Versions: 0.6.0
> Reporter: Saisai Shao
>
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to
> Ganglia web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to
> explain the collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range
> (like 30 days) will fetch all the data and draw graph, which will largely
> lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web
> frontend queries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)