[
https://issues.apache.org/jira/browse/CHUKWA-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883293#action_12883293
]
Bill Graham commented on CHUKWA-444:
------------------------------------
I agree with Jerome, in that I think Chukwa should still be able to be used
without HBase. If you have an HBase install and want real-time, it can be
enabled. Ideally we would have the ability to configurable which data pipeline
to follow. I like the idea of adding an HBase component to the mix though.
> Redefine Chukwa time series storage
> -----------------------------------
>
> Key: CHUKWA-444
> URL: https://issues.apache.org/jira/browse/CHUKWA-444
> Project: Hadoop Chukwa
> Issue Type: New Feature
> Components: Data Processors
> Environment: Redhat EL 5.1, Java 6
> Reporter: Eric Yang
> Assignee: Eric Yang
> Attachments: CHUKWA-444.patch
>
>
> The current Chukwa Record format is not suitable for data visualization. It
> is more like an archive format which combines data from multiple sources
> (hosts), and group them into a sorted time partitioned sequence file. Most
> of people collected data for two reasons, archive and data analysis. The
> current chukwa record format is fine for archive, but it is not so great for
> data analysis. Data analysis could be further break down into two different
> types. 1) Data can be aggregated and summarized, such as metrics. 2) Data
> that can not be summarized, like job history. Type 1 data is useful for
> visualization by graph, and type 2 data is useful by plain text viewing or
> search for a particular event.
> By the above rational, it probably makes sense to restructure Chukwa Records
> for data analysis. Outside of Hadoop world, rrdtools is great for time
> series data storage, and optimized for metrics from a single source, i.e. a
> host. RRD data file fragments badly when there are hundred of thousands of
> sources. Chukwa time series data storage should be able to combine multiple
> data sources into one Chukwa file to combat file fragmentation problem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.