Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by JiaqiTan: http://wiki.apache.org/hadoop/Anomaly_Detection_Framework_with_Chukwa ------------------------------------------------------------------------------ - Describe Anomaly Detection Framework with Chukwa here. - [[TableOfContents]] == Introduction == - Hadoop is a great computation platform for map reduce job, but trouble shooting faulty compute node in the cluster is not an easy task. Chukwa Anomaly Detection System, is a system for detecting computer failure and misuse by monitoring system activity and classifying it as either normal or anomalous. The classification is based on heuristics, rules, and patterns, and will detect any type of misuse that falls out of normal system operation. + We describe a general framework for implementing algorithms for detecting anomalies in systems (Hadoop or otherwise) being monitored by Chukwa, by using the data collected by the Chukwa framework, as well as for visualizing the outcomes of these algorithms. We envision that anomaly detection algorithms for the Chukwa-monitored clusters can be most naturally implemented as described here. - In order to determine what is failure, the system must be taught to recognize normal system activity. This can be accomplished in several ways, most often with artificial intelligence type techniques. Systems using neural networks have been used to great effect. Another method is to define what normal usage of the system comprises using a strict mathematical model, and flag any deviation from this as an system problem. This is known as strict anomaly detection. For the prototyping phase, Chukwa will use strict mathematical model as the skeleton. + The types of operations that this framework would enable fall in these broad categories: + 1. Performing anomaly detection on collected system data (metrics, logs) to identify system elements (nodes, jobs, tasks) that are anomalous, + 1. Applying higher-level processing on collected system data to generate abstract views of the monitored system that synthesize multiple viewpoints, and + 1. Applying higher-level processing on anomaly detection output to generate secondary anomaly detection, + 1. Presenting and/or visualizing the outcomes of the above steps. == Design == - A new processing pipeline has been introduced to post demux processor. This enables Chukwa to run ping/mr job based aggregation and anomaly detection framework. + The tasks described above will be performed in a PostProcess stage which occurs after the Demux. These tasks will take as their inputs the output of the Demux stage, and generate as their outputs (i) anomalous system elements, (ii) abstract system views, or (iii) visualizable data (e.g. raw datapoints to be fed into visualization widgets). These tasks will be MapReduce or Pig jobs, and Chukwa would manage these tasks by accepting a list of MapReduce and/or Pig jobs, and these jobs would form the anomaly detection workflow. + In keeping with the consistency of the Chukwa architecture, these jobs in the anomaly detection workflow would have to accept SequenceFiles of ChukwaRecords as their inputs, and would generate SequenceFiles of ChukwaRecords as their outputs. + Finally, the outputs of these tasks would be fed into HICC for visualization. The current approach would be to use the MDL (Metrics Data Loader) to load the data to an RDBMS of choice which can be read by HICC widgets. + + Hence, the overall workflow of the anomaly detection would be as follows: + + 1. MapReduce/Pig job processes post-Demux output to generate abstract view and/or anomaly detection output + 1. (Optional) Additional MapReduce/Pig job processes abstract views/anomaly detection output to generate secondary anomaly detection output + 1. Data fed into HICC via an RDBMS + 1. HICC widget loads anomaly detection/abstract view data from RDBMS for visualization == Implementation == + === Hadoop anomaly detection and behavioral visualization === + + Current active developments for the Chukwa Anomaly Detection Framework are for detecting anomalies in Hadoop based on the following tools/concepts: + + 1. [http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ SALSA] for State-machine extraction of Hadoop's behavior from its logs + 1. [http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ SALSA] for Hadoop task-based anomaly detection using Hadoop's logs + 1. [http://www.pdl.cmu.edu/PDL-FTP/stray/mochi_tan_hotcloud09_abs.html Mochi] for visualization of Hadoop's behavior (Swimlanes plots, MIROS heatmaps of aggregate data-flow) and extraction of causal job-centric data-flows (JCDF) + 1. [http://www.pdl.cmu.edu/PDL-FTP/stray/CMU-PDL-08-112_abs.html Ganesha] for node-based anomaly detection using OS-collected black-box metrics + + The workflow is as follows (class names, if available, + status listed in square brackets): + + 1. ({{{FSMBuilder}}}, available soon) SALSA is used to extract state-machine views of Hadoop's execution - uses post-Demux output; uses {{{JobData/JobHistory}}},{{{ClientTrace}}},and TaskTracker-generated {{{userlogs}}} + 1. Anomaly detection MapReduce program reads in state-machine data generated from {{{FSMBuilder}}} to generate anomaly alerts. + 1. (CHUKWA-279) State-machine data from {{{FSMBuilder}}} is loaded into RDBMS using MDL + 1. (CHUKWA-279) Raw state-machine views visualized using Swimlanes visualization HICC widget which reads data from RDBMS +
