[jira] Updated: (CHUKWA-444) Redefine Chukwa time series storage

Eric Yang (JIRA) Sun, 08 Aug 2010 11:54:44 -0700

     [ 
https://issues.apache.org/jira/browse/CHUKWA-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eric Yang updated CHUKWA-444:
-----------------------------

    Status: Patch Available  (was: Open)

The current patch is ready to check in if people are fine with [Time 
partition]-[primary key] approach.


> Redefine Chukwa time series storage
> -----------------------------------
>
>                 Key: CHUKWA-444
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-444
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>         Attachments: CHUKWA-444-2.patch
>
>
> The current Chukwa Record format is not suitable for data visualization.  It 
> is more like an archive format which combines data from multiple sources 
> (hosts), and group them into a sorted time partitioned sequence file.  Most 
> of people collected data for two reasons, archive and data analysis.  The 
> current chukwa record format is fine for archive, but it is not so great for 
> data analysis.  Data analysis could be further break down into two different 
> types.  1) Data can be aggregated and summarized, such as metrics.  2) Data 
> that can not be summarized, like job history.  Type 1 data is useful for 
> visualization by graph, and type 2 data is useful by plain text viewing or 
> search for a particular event.
> By the above rational, it probably makes sense to restructure Chukwa Records 
> for data analysis.  Outside of Hadoop world, rrdtools is great for time 
> series data storage, and optimized for metrics from a single source, i.e. a 
> host.  RRD data file fragments badly when there are hundred of thousands of 
> sources.  Chukwa time series data storage should be able to combine multiple 
> data sources into one Chukwa file to combat file fragmentation problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CHUKWA-444) Redefine Chukwa time series storage

Reply via email to