[jira] Updated: (HADOOP-3719) Chukwa

Ari Rabkin (JIRA) Tue, 08 Jul 2008 16:41:24 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ari Rabkin updated HADOOP-3719:
-------------------------------

    Attachment: chukwa_08.pdf

Chukwa is designed to collect monitoring data (especially log files), and get 
the data into HDFS as quickly as possible.  Data is initially collected by a 
Local Agent running on each machine being monitored. This Local Agent has a 
pluggable architecture, allowing many different adaptors to be used, each of 
which produces a particular stream of data.  Local Agents send their data via 
HTTP to Collectors, which write out data into "sink files" in HDFS.  

Map-reduce jobs run periodically to analyze these sink files, and to drain 
their contents into structured storage.

Chukwa provides a natural solution to the log collection problem, posed in 
HADOOP-2206. Once we have Chukwa working at scale, we intend to produce some 
patches to Hadoop to trigger log collection appropriately.

We expect this work to ultimately be complementary to HADOOP-3585, the failure 
analysis system. We want to collect similar data, and our framework is flexible 
enough to accommodate the proposed structure there, with only modest code 
changes on each side.

The attached document introduces Chukwa, and describes the data collection 
architecture. We do not present our analytics and visualization in detail in 
this document.  We intend to describe them in a second document in the near 
future.



> Chukwa
> ------
>
>                 Key: HADOOP-3719
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3719
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Ari Rabkin
>         Attachments: chukwa_08.pdf
>
>
> We'd like to contribute Chukwa, a data collection and analysis framework 
> being developed at Yahoo!.  Chukwa is a natural complement to Hadoop, since 
> it is built on top of HDFS and Map-Reduce, and since Hadoop clusters are a 
> key use case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3719) Chukwa

Reply via email to