[
https://issues.apache.org/jira/browse/CHUKWA-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883271#action_12883271
]
Ari Rabkin commented on CHUKWA-444:
-----------------------------------
Overall HBase-based approach seems to make sense.
I think this might be a good time to do something we've been talking about for
a while -- unifying the writer and sender APIs. This unification would also
advance Jerome's SDK vision, I think, by making the architecture more flexible.
> Redefine Chukwa time series storage
> -----------------------------------
>
> Key: CHUKWA-444
> URL: https://issues.apache.org/jira/browse/CHUKWA-444
> Project: Hadoop Chukwa
> Issue Type: New Feature
> Components: Data Processors
> Environment: Redhat EL 5.1, Java 6
> Reporter: Eric Yang
> Assignee: Eric Yang
> Attachments: CHUKWA-444.patch
>
>
> The current Chukwa Record format is not suitable for data visualization. It
> is more like an archive format which combines data from multiple sources
> (hosts), and group them into a sorted time partitioned sequence file. Most
> of people collected data for two reasons, archive and data analysis. The
> current chukwa record format is fine for archive, but it is not so great for
> data analysis. Data analysis could be further break down into two different
> types. 1) Data can be aggregated and summarized, such as metrics. 2) Data
> that can not be summarized, like job history. Type 1 data is useful for
> visualization by graph, and type 2 data is useful by plain text viewing or
> search for a particular event.
> By the above rational, it probably makes sense to restructure Chukwa Records
> for data analysis. Outside of Hadoop world, rrdtools is great for time
> series data storage, and optimized for metrics from a single source, i.e. a
> host. RRD data file fragments badly when there are hundred of thousands of
> sources. Chukwa time series data storage should be able to combine multiple
> data sources into one Chukwa file to combat file fragmentation problem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.