[Hadoop Wiki] Update of "Anomaly Detection Framework with Chukwa" by JiaqiTan

Apache Wiki Thu, 11 Jun 2009 00:37:52 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by JiaqiTan:
http://wiki.apache.org/hadoop/Anomaly_Detection_Framework_with_Chukwa

------------------------------------------------------------------------------
- Describe Anomaly Detection Framework with Chukwa here.
- 
  [[TableOfContents]]
  
  == Introduction ==
  
- Hadoop is a great computation platform for map reduce job, but trouble 
shooting faulty compute node in the cluster is not an easy task.    Chukwa 
Anomaly Detection System, is a system for detecting computer failure and misuse 
by monitoring system activity and classifying it as either normal or anomalous. 
The classification is based on heuristics, rules, and patterns, and will detect 
any type of misuse that falls out of normal system operation.
+ We describe a general framework for implementing algorithms for detecting 
anomalies in systems (Hadoop or otherwise) being monitored by Chukwa, by using 
the data collected by the Chukwa framework, as well as for visualizing the 
outcomes of these algorithms. We envision that anomaly detection algorithms for 
the Chukwa-monitored clusters can be most naturally implemented as described 
here. 
  
- In order to determine what is failure, the system must be taught to recognize 
normal system activity. This can be accomplished in several ways, most often 
with artificial intelligence type techniques. Systems using neural networks 
have been used to great effect. Another method is to define what normal usage 
of the system comprises using a strict mathematical model, and flag any 
deviation from this as an system problem. This is known as strict anomaly 
detection.  For the prototyping phase, Chukwa will use strict mathematical 
model as the skeleton.
+ The types of operations that this framework would enable fall in these broad 
categories: 
+  1. Performing anomaly detection on collected system data (metrics, logs) to 
identify system elements (nodes, jobs, tasks) that are anomalous,
+  1. Applying higher-level processing on collected system data to generate 
abstract views of the monitored system that synthesize multiple viewpoints, and
+  1. Applying higher-level processing on anomaly detection output to generate 
secondary anomaly detection, 
+  1. Presenting and/or visualizing the outcomes of the above steps.
  
  == Design ==
  
- A new processing pipeline has been introduced to post demux processor.  This 
enables Chukwa to run ping/mr job based aggregation and anomaly detection 
framework.
+ The tasks described above will be performed in a PostProcess stage which 
occurs after the Demux. These tasks will take as their inputs the output of the 
Demux stage, and generate as their outputs (i) anomalous system elements, (ii) 
abstract system views, or (iii) visualizable data (e.g. raw datapoints to be 
fed into visualization widgets). These tasks will be MapReduce or Pig jobs, and 
Chukwa would manage these tasks by accepting a list of MapReduce and/or Pig 
jobs, and these jobs would form the anomaly detection workflow. 
  
+ In keeping with the consistency of the Chukwa architecture, these jobs in the 
anomaly detection workflow would have to accept SequenceFiles of ChukwaRecords 
as their inputs, and would generate SequenceFiles of ChukwaRecords as their 
outputs. 
  
+ Finally, the outputs of these tasks would be fed into HICC for visualization. 
The current approach would be to use the MDL (Metrics Data Loader) to load the 
data to an RDBMS of choice which can be read by HICC widgets.
+ 
+ Hence, the overall workflow of the anomaly detection would be as follows:
+ 
+  1. MapReduce/Pig job processes post-Demux output to generate abstract view 
and/or anomaly detection output
+  1. (Optional) Additional MapReduce/Pig job processes abstract views/anomaly 
detection output to generate secondary anomaly detection output
+  1. Data fed into HICC via an RDBMS
+  1. HICC widget loads anomaly detection/abstract view data from RDBMS for 
visualization
  
  == Implementation ==
  
+ === Hadoop anomaly detection and behavioral visualization  ===
+ 
+ Current active developments for the Chukwa Anomaly Detection Framework are 
for detecting anomalies in Hadoop based on the following tools/concepts:
+ 
+  1. [http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ SALSA] 
for State-machine extraction of Hadoop's behavior from its logs
+  1. [http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ SALSA] 
for Hadoop task-based anomaly detection using Hadoop's logs
+  1. [http://www.pdl.cmu.edu/PDL-FTP/stray/mochi_tan_hotcloud09_abs.html 
Mochi] for visualization of Hadoop's behavior (Swimlanes plots, MIROS heatmaps 
of aggregate data-flow) and extraction of causal job-centric data-flows (JCDF)
+  1. [http://www.pdl.cmu.edu/PDL-FTP/stray/CMU-PDL-08-112_abs.html Ganesha] 
for node-based anomaly detection using OS-collected black-box metrics
+ 
+ The workflow is as follows (class names, if available, + status listed in 
square brackets):
+ 
+  1. ({{{FSMBuilder}}}, available soon) SALSA is used to extract state-machine 
views of Hadoop's execution - uses post-Demux output; uses 
{{{JobData/JobHistory}}},{{{ClientTrace}}},and TaskTracker-generated 
{{{userlogs}}}
+  1. Anomaly detection MapReduce program reads in state-machine data generated 
from {{{FSMBuilder}}} to generate anomaly alerts. 
+  1. (CHUKWA-279) State-machine data from {{{FSMBuilder}}} is loaded into 
RDBMS using MDL 
+  1. (CHUKWA-279) Raw state-machine views visualized using Swimlanes 
visualization HICC widget which reads data from RDBMS
+

[Hadoop Wiki] Update of "Anomaly Detection Framework with Chukwa" by JiaqiTan

Reply via email to