[
https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491643#comment-14491643
]
Eric Yang commented on CHUKWA-667:
----------------------------------
Hi Sreepathi,
Metrics for the whole day will update the same row. However, row is just a
reference pointer to the actual data block. This reduces the number of lookup
to the data block. Cell appends to the new data in memory or WAL log and spill
to disk during compaction. This design reduces the stress point of monotonic
increasing index. It will reach optimal balanced regions after 1 year of
running because we partition by day. Partition by numeric number is better
than metric group prefix because metric group prefix can generate uneven size
of regions because some metric group contains more metrics than others. For
this reason, the design added day as prefix of the row key.
> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>
> Key: CHUKWA-667
> URL: https://issues.apache.org/jira/browse/CHUKWA-667
> Project: Chukwa
> Issue Type: New Feature
> Components: Data Processors
> Affects Versions: 0.6.0
> Reporter: Saisai Shao
> Fix For: 0.7.0
>
> Attachments: CHUKWA-667.patch
>
>
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to
> Ganglia web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to
> explain the collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range
> (like 30 days) will fetch all the data and draw graph, which will largely
> lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web
> frontend queries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)