[
https://issues.apache.org/jira/browse/HADOOP-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ari Rabkin updated HADOOP-3719:
-------------------------------
Attachment: chukwa_08.pdf
Chukwa is designed to collect monitoring data (especially log files), and get
the data into HDFS as quickly as possible. Data is initially collected by a
Local Agent running on each machine being monitored. This Local Agent has a
pluggable architecture, allowing many different adaptors to be used, each of
which produces a particular stream of data. Local Agents send their data via
HTTP to Collectors, which write out data into "sink files" in HDFS.
Map-reduce jobs run periodically to analyze these sink files, and to drain
their contents into structured storage.
Chukwa provides a natural solution to the log collection problem, posed in
HADOOP-2206. Once we have Chukwa working at scale, we intend to produce some
patches to Hadoop to trigger log collection appropriately.
We expect this work to ultimately be complementary to HADOOP-3585, the failure
analysis system. We want to collect similar data, and our framework is flexible
enough to accommodate the proposed structure there, with only modest code
changes on each side.
The attached document introduces Chukwa, and describes the data collection
architecture. We do not present our analytics and visualization in detail in
this document. We intend to describe them in a second document in the near
future.
> Chukwa
> ------
>
> Key: HADOOP-3719
> URL: https://issues.apache.org/jira/browse/HADOOP-3719
> Project: Hadoop Core
> Issue Type: Improvement
> Reporter: Ari Rabkin
> Attachments: chukwa_08.pdf
>
>
> We'd like to contribute Chukwa, a data collection and analysis framework
> being developed at Yahoo!. Chukwa is a natural complement to Hadoop, since
> it is built on top of HDFS and Map-Reduce, and since Hadoop clusters are a
> key use case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.