[
https://issues.apache.org/jira/browse/CHUKWA-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Yang updated CHUKWA-444:
-----------------------------
Status: Patch Available (was: Open)
The current patch is ready to check in if people are fine with [Time
partition]-[primary key] approach.
> Redefine Chukwa time series storage
> -----------------------------------
>
> Key: CHUKWA-444
> URL: https://issues.apache.org/jira/browse/CHUKWA-444
> Project: Chukwa
> Issue Type: New Feature
> Components: Data Processors
> Environment: Redhat EL 5.1, Java 6
> Reporter: Eric Yang
> Assignee: Eric Yang
> Attachments: CHUKWA-444-2.patch
>
>
> The current Chukwa Record format is not suitable for data visualization. It
> is more like an archive format which combines data from multiple sources
> (hosts), and group them into a sorted time partitioned sequence file. Most
> of people collected data for two reasons, archive and data analysis. The
> current chukwa record format is fine for archive, but it is not so great for
> data analysis. Data analysis could be further break down into two different
> types. 1) Data can be aggregated and summarized, such as metrics. 2) Data
> that can not be summarized, like job history. Type 1 data is useful for
> visualization by graph, and type 2 data is useful by plain text viewing or
> search for a particular event.
> By the above rational, it probably makes sense to restructure Chukwa Records
> for data analysis. Outside of Hadoop world, rrdtools is great for time
> series data storage, and optimized for metrics from a single source, i.e. a
> host. RRD data file fragments badly when there are hundred of thousands of
> sources. Chukwa time series data storage should be able to combine multiple
> data sources into one Chukwa file to combat file fragmentation problem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.