svn commit: r1789961 [4/5] - in /eagle/site/docs: ./ latest/ latest/include/images/ latest/mkdocs/ latest/mkdocs/js/ v0.5.0/ v0.5.0/include/images/ v0.5.0/mkdocs/ v0.5.0/mkdocs/js/

hao Mon, 03 Apr 2017 04:29:56 -0700

Added: eagle/site/docs/v0.5.0/mkdocs/search_index.json
URL: 
http://svn.apache.org/viewvc/eagle/site/docs/v0.5.0/mkdocs/search_index.json?rev=1789961&view=auto
==============================================================================
--- eagle/site/docs/v0.5.0/mkdocs/search_index.json (added)
+++ eagle/site/docs/v0.5.0/mkdocs/search_index.json Mon Apr  3 11:29:31 2017
@@ -0,0 +1,739 @@
+{
+    "docs": [
+        {
+            "location": "/", 
+            "text": "What is Eagle\n\n\n Apache Eagle \n (incubating) is a 
highly extensible, scalable monitoring and alerting platform, designed with its 
flexible application framework and proven big data technologies, such as Kafka, 
Spark and Storm. It ships a rich set of applications for big data platform 
monitoring, e.g. HDFS/HBase/YARN service health check, JMX metrics, daemon 
logs, audit logs and yarn applications. External Eagle developers can define 
applications to monitoring their NoSQLs or Web Servers, and publish to Eagle 
application repository at your own discretion. It also provides the 
state-of-art alert engine to report security breaches, service failures, and 
application anomalies, highly customizable by the alert policy definition. 
\n\n\n\n\nTerminology\n\n\nSite\n\n\n\n\nA virtual concept in Apache Eagle. You 
can use it to manage a group of application instances, and distinguish the 
applications if you have a certain application installed for multiple 
times.\n\n\n\
 n\nApplication\n\n\n\n\nApplication(or Monitoring Application) is the 
first-class citizen in Apache Eagle, it stands for an end-to-end 
monitoring/alerting solution, which usually contains the monitoring source 
onboarding, source schema specification, alerting policy and dashboard 
definition.\n\n\n\n\nStream\n\n\n\n\nStream is the input for Alert Engine, each 
Application should have its own stream to be defined by the developer. Usually, 
it will have a POJO-like structure included in the stream definition. Once it's 
defined, Application should have the logic to write data into 
Kafka.\n\n\n\n\nData Activity Monitoring\n\n\n\n\nA built-in monitoring 
application to monitor HDFS/HBase/Hive operations, and allow users to define 
certain policies to detect sensitive data access and malicious data operations 
in real-time.\n\n\n\n\nAlert Engine\n\n\n\n\nA specific built-in application 
shared for all other monitoring applications, it reads data from Kafka, and 
processes the data by applying th
 e policy in real-time manner, and generates alert notification. So we call 
this application as the Alert Engine.\n\n\n\n\nPolicy\n\n\n\n\nA rule used by 
Alert Engine to match the data input from Kafka. Policy is defined in 
\nSiddhiQL\n format.\n\n\n\n\nAlert\n\n\n\n\nIf any data input to Alert Engine 
meets the policy, the Alert Engine will generate a message and publish it 
through alert publisher. We call such messages as the alerts.\n\n\n\n\nAlert 
Publisher\n\n\n\n\nIt will publish the alert to external channels which can be 
the SMTP channel, the Kafka channel, Slack channel or other storage 
systems.\n\n\n\n\nKey Qualities\n\n\nExtensible\n\n\n\n\nApache Eagle built its 
core framework around the application concept, application itself includes the 
logic for monitoring source data collection, pre-processing and normalization. 
Developer can easily develop his own out-of-box monitoring applications using 
Eagle's application framework, and deploy into 
Eagle.\n\n\n\n\nScalable\n\n\n\n\n
 The Eagle core team has chosen the proven big data technologies to build its 
fundamental runtime, and apply a scalable core to make it adaptive according to 
the throughput of data stream as well as the number of monitored 
applications.\n\n\n\n\nReal-time\n\n\n\n\nStorm or Spark Streaming based 
computing engine allow us to apply the policy to data stream and generate 
alerts in real-time manner.\n\n\n\n\nDynamic\n\n\n\n\nThe user can freely 
enable or disable a monitoring application without restarting the service. 
Eagle user can dynamically add/delet/change their alert policies without any 
impact to the underlying runtime.\n\n\n\n\nEasy-of-Use\n\n\n\n\nUser can enable 
the monitoring for a service within minutes effort by just choosing the 
corresponding monitoring application and configuring few parameters for the 
service.\n\n\n\n\nNon-Invasive\n\n\n\n\nApache Eagle uses the out-of-box 
applications to monitor services, you don't need any change to your existing 
services.\n\n\n\n\n\n\nU
 se Case Examples\n\n\nData Activity Monitoring\n\n\n\n\n\n\nData activity 
represents how user explores data provided by big data platforms. Analyzing 
data activity and alerting for insecure access are fundamental requirements for 
securing enterprise data. As data volume is increasing exponentially with 
Hadoop, Hive, Spark technology, understanding data activities for every user 
becomes extremely hard, let alone to alert for a single malicious event in real 
time among petabytes streaming data per day.\n\n\n\n\n\n\nSecuring enterprise 
data starts from understanding data activities for every user. Apache Eagle 
(incubating, called Eagle in the following) has integrated with many popular 
big data platforms e.g. Hadoop, Hive, Spark, Cassandra etc. With Eagle user can 
browse data hierarchy, mark sensitive data and then create comprehensive policy 
to alert for insecure data access.\n\n\n\n\n\n\nJob Performance 
Analysis\n\n\n\n\n\n\nRunning map/reduce job is the most popular way people use 
t
 o analyze data in Hadoop system. Analyzing job performance and providing 
tuning suggestions are critical for Hadoop system stability, job SLA and 
resource usage etc.\n\n\n\n\n\n\nEagle analyzes job performance with two 
complementing approaches. First Eagle periodically takes snapshots for all 
running jobs with YARN API, secondly Eagle continuously reads job lifecycle 
events immediately after the job is completed. With the two approaches, Eagle 
can analyze single job's trend, data skew problem, failure reasons etc. More 
interestingly, Eagle can analyze whole Hadoop cluster's performance by taking 
into account all jobs.\n\n\n\n\n\n\nCluster Performance 
Analytics\n\n\n\n\n\n\nIt is critical to understand why a cluster performs bad. 
Is that because of some crazy jobs recently on-boarded, or huge amount of tiny 
files, or namenode performance degrading?\n\n\n\n\n\n\nEagle in realtime 
calculates resource usage per minute out of individual jobs, e.g. CPU, memory, 
HDFS IO bytes, HDFS IO numO
 ps etc. and also collects namenode JMX metrics. Correlating them together will 
easily help system administrator find root cause for cluster 
slowness.\n\n\n\n\n\n\n\n\nDisclaimer\n\n\n\n\nApache Eagle now is being 
incubated, and therefore, across the whole documentation site, all appearances 
of case-insensitive word \neagle\n and \napache eagle\n represent \nApache 
Eagle (incubating)\n. This could be seen as a part of disclaimer.", 
+            "title": "Home"
+        }, 
+        {
+            "location": "/#what-is-eagle", 
+            "text": "Apache Eagle   (incubating) is a highly extensible, 
scalable monitoring and alerting platform, designed with its flexible 
application framework and proven big data technologies, such as Kafka, Spark 
and Storm. It ships a rich set of applications for big data platform 
monitoring, e.g. HDFS/HBase/YARN service health check, JMX metrics, daemon 
logs, audit logs and yarn applications. External Eagle developers can define 
applications to monitoring their NoSQLs or Web Servers, and publish to Eagle 
application repository at your own discretion. It also provides the 
state-of-art alert engine to report security breaches, service failures, and 
application anomalies, highly customizable by the alert policy definition.", 
+            "title": "What is Eagle"
+        }, 
+        {
+            "location": "/#terminology", 
+            "text": "", 
+            "title": "Terminology"
+        }, 
+        {
+            "location": "/#site", 
+            "text": "A virtual concept in Apache Eagle. You can use it to 
manage a group of application instances, and distinguish the applications if 
you have a certain application installed for multiple times.", 
+            "title": "Site"
+        }, 
+        {
+            "location": "/#application", 
+            "text": "Application(or Monitoring Application) is the first-class 
citizen in Apache Eagle, it stands for an end-to-end monitoring/alerting 
solution, which usually contains the monitoring source onboarding, source 
schema specification, alerting policy and dashboard definition.", 
+            "title": "Application"
+        }, 
+        {
+            "location": "/#stream", 
+            "text": "Stream is the input for Alert Engine, each Application 
should have its own stream to be defined by the developer. Usually, it will 
have a POJO-like structure included in the stream definition. Once it's 
defined, Application should have the logic to write data into Kafka.", 
+            "title": "Stream"
+        }, 
+        {
+            "location": "/#data-activity-monitoring", 
+            "text": "A built-in monitoring application to monitor 
HDFS/HBase/Hive operations, and allow users to define certain policies to 
detect sensitive data access and malicious data operations in real-time.", 
+            "title": "Data Activity Monitoring"
+        }, 
+        {
+            "location": "/#alert-engine", 
+            "text": "A specific built-in application shared for all other 
monitoring applications, it reads data from Kafka, and processes the data by 
applying the policy in real-time manner, and generates alert notification. So 
we call this application as the Alert Engine.", 
+            "title": "Alert Engine"
+        }, 
+        {
+            "location": "/#policy", 
+            "text": "A rule used by Alert Engine to match the data input from 
Kafka. Policy is defined in  SiddhiQL  format.", 
+            "title": "Policy"
+        }, 
+        {
+            "location": "/#alert", 
+            "text": "If any data input to Alert Engine meets the policy, the 
Alert Engine will generate a message and publish it through alert publisher. We 
call such messages as the alerts.", 
+            "title": "Alert"
+        }, 
+        {
+            "location": "/#alert-publisher", 
+            "text": "It will publish the alert to external channels which can 
be the SMTP channel, the Kafka channel, Slack channel or other storage 
systems.", 
+            "title": "Alert Publisher"
+        }, 
+        {
+            "location": "/#key-qualities", 
+            "text": "", 
+            "title": "Key Qualities"
+        }, 
+        {
+            "location": "/#extensible", 
+            "text": "Apache Eagle built its core framework around the 
application concept, application itself includes the logic for monitoring 
source data collection, pre-processing and normalization. Developer can easily 
develop his own out-of-box monitoring applications using Eagle's application 
framework, and deploy into Eagle.", 
+            "title": "Extensible"
+        }, 
+        {
+            "location": "/#scalable", 
+            "text": "The Eagle core team has chosen the proven big data 
technologies to build its fundamental runtime, and apply a scalable core to 
make it adaptive according to the throughput of data stream as well as the 
number of monitored applications.", 
+            "title": "Scalable"
+        }, 
+        {
+            "location": "/#real-time", 
+            "text": "Storm or Spark Streaming based computing engine allow us 
to apply the policy to data stream and generate alerts in real-time manner.", 
+            "title": "Real-time"
+        }, 
+        {
+            "location": "/#dynamic", 
+            "text": "The user can freely enable or disable a monitoring 
application without restarting the service. Eagle user can dynamically 
add/delet/change their alert policies without any impact to the underlying 
runtime.", 
+            "title": "Dynamic"
+        }, 
+        {
+            "location": "/#easy-of-use", 
+            "text": "User can enable the monitoring for a service within 
minutes effort by just choosing the corresponding monitoring application and 
configuring few parameters for the service.", 
+            "title": "Easy-of-Use"
+        }, 
+        {
+            "location": "/#non-invasive", 
+            "text": "Apache Eagle uses the out-of-box applications to monitor 
services, you don't need any change to your existing services.", 
+            "title": "Non-Invasive"
+        }, 
+        {
+            "location": "/#use-case-examples", 
+            "text": "", 
+            "title": "Use Case Examples"
+        }, 
+        {
+            "location": "/#data-activity-monitoring_1", 
+            "text": "Data activity represents how user explores data provided 
by big data platforms. Analyzing data activity and alerting for insecure access 
are fundamental requirements for securing enterprise data. As data volume is 
increasing exponentially with Hadoop, Hive, Spark technology, understanding 
data activities for every user becomes extremely hard, let alone to alert for a 
single malicious event in real time among petabytes streaming data per day.    
Securing enterprise data starts from understanding data activities for every 
user. Apache Eagle (incubating, called Eagle in the following) has integrated 
with many popular big data platforms e.g. Hadoop, Hive, Spark, Cassandra etc. 
With Eagle user can browse data hierarchy, mark sensitive data and then create 
comprehensive policy to alert for insecure data access.", 
+            "title": "Data Activity Monitoring"
+        }, 
+        {
+            "location": "/#job-performance-analysis", 
+            "text": "Running map/reduce job is the most popular way people use 
to analyze data in Hadoop system. Analyzing job performance and providing 
tuning suggestions are critical for Hadoop system stability, job SLA and 
resource usage etc.    Eagle analyzes job performance with two complementing 
approaches. First Eagle periodically takes snapshots for all running jobs with 
YARN API, secondly Eagle continuously reads job lifecycle events immediately 
after the job is completed. With the two approaches, Eagle can analyze single 
job's trend, data skew problem, failure reasons etc. More interestingly, Eagle 
can analyze whole Hadoop cluster's performance by taking into account all 
jobs.", 
+            "title": "Job Performance Analysis"
+        }, 
+        {
+            "location": "/#cluster-performance-analytics", 
+            "text": "It is critical to understand why a cluster performs bad. 
Is that because of some crazy jobs recently on-boarded, or huge amount of tiny 
files, or namenode performance degrading?    Eagle in realtime calculates 
resource usage per minute out of individual jobs, e.g. CPU, memory, HDFS IO 
bytes, HDFS IO numOps etc. and also collects namenode JMX metrics. Correlating 
them together will easily help system administrator find root cause for cluster 
slowness.", 
+            "title": "Cluster Performance Analytics"
+        }, 
+        {
+            "location": "/#disclaimer", 
+            "text": "Apache Eagle now is being incubated, and therefore, 
across the whole documentation site, all appearances of case-insensitive word  
eagle  and  apache eagle  represent  Apache Eagle (incubating) . This could be 
seen as a part of disclaimer.", 
+            "title": "Disclaimer"
+        }, 
+        {
+            "location": "/getting-started/", 
+            "text": "Architecture\n\n\n\n\nEagle 
Apps\n\n\n\n\nSecurity\n\n\nHadoop\n\n\nOperational Intelligence\n\n\n\n\nFor 
more applications, see \nApplications\n.\n\n\nEagle Interface\n\n\n\n\nREST 
Service\n\n\nManagement UI\n\n\nCustomizable Analytics 
Visualization\n\n\n\n\nEagle Integration\n\n\n\n\nApache 
Ambari\n\n\nDocker\n\n\nApache Ranger\n\n\nDataguise\n\n\n\n\nEagle 
Framework\n\n\nEagle has multiple distributed real-time frameworks for 
efficiently developing highly scalable monitoring applications.\n\n\nAlert 
Engine\n\n\n\n\n\n\nReal-time: Apache Storm (Execution Engine) + Kafka (Message 
Bus)\n\n\n\n\nDeclarative Policy: SQL (CEP) on Streaming\n        from 
hadoopJmxMetricEventStream\n        [metric == 
\"hadoop.namenode.fsnamesystemstate.capacityused\" and value \n 0.9] \n        
select metric, host, value, timestamp, component, site \n        insert into 
alertStream;\n\n\n\n\n\n\nDynamical onboarding \n correlation\n\n\n\n\nNo 
downtime migration and upgrading\n\n\n\n
 \nStorage Engine\n\n\n\n\n\n\n\n\nLight-weight ORM Framework for 
HBase/RDMBS\n\n\n@Table(\"HbaseTableName\")\n@ColumnFamily(\"ColumnFamily\")\n@Prefix(\"RowkeyPrefix\")\n@Service(\"UniqueEntitytServiceName\")\n@JsonIgnoreProperties(ignoreUnknown
 = true)\n@TimeSeries(false)\n@Indexes({\n    
@Index(name=\"Index_1_alertExecutorId\", columns = { \"alertExecutorID\" }, 
unique = true)})\npublic class AlertDefinitionAPIEntity extends 
TaggedLogAPIEntity{\n@Column(\"a\")\nprivate String 
desc;\n\n\n\n\n\n\n\nFull-function SQL-Like REST Query 
\n\n\nQuery=UniqueEntitytServiceName[@site=\"sandbox\"]{*}\n\n\n\n\n\n\n\nOptimized
 Rowkey design for time-series data, optimized for metric/entity/log, etc. 
different storage types\n\n\nRowkey ::= Prefix | Partition Keys | timestamp | 
tagName | tagValue | \u2026\n\n\n\n\n\n\n\nSecondary Index Support\n        
@Indexes(, unique = true/false)})\n\n\n\n\n\n\nNative HBase Coprocessor\n       
 org.apache.eagle.storage.hbase.query.coprocessor.AggregateProtocol
 EndPoint\n\n\n\n\n\n\nUI Framework\n\n\nEagle UI is consist of following 
parts:\n\n\n\n\nEagle Main UI\n\n\nEagle App 
Portal/Dashboard/Widgets\n\n\nEagle Customized Dashboard \n\n\n\n\nApplication 
Framework\n\n\nApplication\n\n\nAn \"Application\" or \"App\" is composed of 
data integration, policies and insights for one data source.\n\n\nApplication 
Descriptor\n\n\nAn \"Application Descriptor\" is a static packaged metadata 
information consist of basic information like type, name, version, description, 
and application process, configuration, streams, docs, policies and so on. 
\n\n\nHere is an example ApplicationDesc of \nJPM_WEB_APP\n\n\n    {\n    type: 
\"JPM_WEB_APP\",\n    name: \"Job Performance Monitoring Web \",\n    version: 
\"0.5.0-incubating\",\n    description: null,\n    appClass: 
\"org.apache.eagle.app.StaticApplication\",\n    jarPath: 
\"/opt/eagle/0.5.0-incubating-SNAPSHOT-build-20161103T0332/eagle-0.5.0-incubating-SNAPSHOT/lib/eagle-topology-0.5.0-incubating-SNAPSHOT-
 hadoop-2.4.1-11-assembly.jar\",\n    viewPath: \"/apps/jpm\",\n    
providerClass: \"org.apache.eagle.app.jpm.JPMWebApplicationProvider\",\n    
configuration: {\n        properties: [{\n            name: \"service.host\",\n 
           displayName: \"Eagle Service Host\",\n            value: 
\"localhost\",\n            description: \"Eagle Service Host, default: 
localhost\",\n            required: false\n        }, {\n            name: 
\"service.port\",\n            displayName: \"Eagle Service Port\",\n           
 value: \"8080\",\n            description: \"Eagle Service Port, default: 
8080\",\n            required: false\n        }]\n    },\n    streams: null,\n  
  docs: null,\n    executable: false,\n    dependencies: [{\n        type: 
\"MR_RUNNING_JOB_APP\",\n        version: \"0.5.0-incubating\",\n        
required: true\n    }, {\n        type: \"MR_HISTORY_JOB_APP\",\n        
version: \"0.5.0-incubating\",\n        required: true\n    }]\n    
}\n\n\n\nApplication Provider\n\n\n
 Appilcation Provider is a package management and loading mechanism leveraging 
\nJava SPI\n.\n\n\nFor example, in file 
\nMETA-INF/services/org.apache.eagle.app.spi.ApplicationProvider\n, place the 
full class name of an application 
provider:\n\n\norg.apache.eagle.app.jpm.JPMWebApplicationProvider\n\n\n\n\n\nConcepts\n\n\n\n\nHere
 are some terms we are using in Apache Eagle (incubating, called Eagle in the 
following), please check them for your reference. They are basic knowledge of 
Eagle which also will help to well understand Eagle.\n\n\n\n\nSite\n\n\n\n\nA 
site can be considered as a physical data center. Big data platform e.g. Hadoop 
may be deployed to multiple data centers in an 
enterprise.\n\n\n\n\nApplication\n\n\n\n\nAn \"Application\" or \"App\" is 
composed of data integration, policies and insights for one data 
source.\n\n\n\n\nPolicy\n\n\n\n\nA \"Policy\" defines the rule to alert. Policy 
can be simply a filter expression or a complex window based aggregation rules 
etc.\n\n\
 n\n\nAlerts\n\n\n\n\nAn \"Alert\" is an real-time event detected with certain 
alert policy or correlation logic, with different severity levels like 
INFO/WARNING/DANGER.\n\n\n\n\nData Source\n\n\n\n\nA \"Data Source\" is a 
monitoring target data. Eagle supports many data sources HDFS audit logs, Hive2 
query, MapReduce job etc.\n\n\n\n\nStream\n\n\n\n\nA \"Stream\" is the 
streaming data from a data source. Each data source has its own 
stream.\n\n\n\n\n\n\nQuick Start\n\n\nDeployment\n\n\nPrerequisites\n\n\nEagle 
requires the following dependencies:\n\n\n\n\nFor streaming platform 
dependencies\n\n\nStorm: 0.9.3 or later\n\n\nHadoop: 2.6.x or later\n\n\nHbase: 
0.98.x or later\n\n\nKafka: 0.8.x or later\n\n\nZookeeper: 3.4.6 or 
later\n\n\nJava: 1.8.x\n\n\n\n\n\n\nFor metadata database dependencies (Choose 
one of them)\n\n\nMangoDB 3.2.2 or later\n\n\nInstallation is 
required\n\n\n\n\n\n\nMysql 5.1.x or later\n\n\nInstallation is 
required\n\n\n\n\n\n\n\n\n\n\n\n\nNotice:  \n\n\n\n\nStorm
  0.9.x does NOT support JDK8. You can replace asm-4.0.jar with asm-all-5.0.jar 
in the storm lib directory. \nThen restart other 
services(nimbus/ui/supervisor).\n\n\n\n\n\nInstallation\n\n\nBuild 
Eagle\n\n\n\n\n\n\nDownload the latest version of Eagle source code.\n\n\ngit 
clone https://github.com/apache/incubator-eagle.git\n\n\n\n\n\n\n\nBuild the 
source code, and a tar.gz package will be generated under 
eagle-server-assembly/target\n\n\nmvn clean install 
-DskipTests\n\n\n\n\n\n\n\nDeploy Eagle\n\n\n\n\nCopy binary package to your 
server machine. In the package, you should find:\n\n\nbin/\n: scripts used for 
start eagle server\n\n\nconf/\n: default configurations for eagle server 
setup.\n\n\nlib/\n : all included software packages for eagle 
server\n\n\n\n\n\n\nChange configurations under 
\nconf/\n\n\neagle.conf\n\n\nserver.yml\n\n\n\n\n\n\n\n\nRun 
eagle-server.sh\n\n\n./bin/eagle-server.sh start\n\n\n\n\n\n\n\nCheck eagle 
server\n\n\n\n\nVisit http://host:port/ in your web browser.\
 n\n\n\n\n\n\n\n\nSetup Your Monitoring Case\n\n\nPlaceholder for topic: Setup 
Your Monitoring Case", 
+            "title": "Getting Started"
+        }, 
+        {
+            "location": "/getting-started/#architecture", 
+            "text": "", 
+            "title": "Architecture"
+        }, 
+        {
+            "location": "/getting-started/#eagle-apps", 
+            "text": "Security  Hadoop  Operational Intelligence   For more 
applications, see  Applications .", 
+            "title": "Eagle Apps"
+        }, 
+        {
+            "location": "/getting-started/#eagle-interface", 
+            "text": "REST Service  Management UI  Customizable Analytics 
Visualization", 
+            "title": "Eagle Interface"
+        }, 
+        {
+            "location": "/getting-started/#eagle-integration", 
+            "text": "Apache Ambari  Docker  Apache Ranger  Dataguise", 
+            "title": "Eagle Integration"
+        }, 
+        {
+            "location": "/getting-started/#eagle-framework", 
+            "text": "Eagle has multiple distributed real-time frameworks for 
efficiently developing highly scalable monitoring applications.", 
+            "title": "Eagle Framework"
+        }, 
+        {
+            "location": "/getting-started/#alert-engine", 
+            "text": "Real-time: Apache Storm (Execution Engine) + Kafka 
(Message Bus)   Declarative Policy: SQL (CEP) on Streaming\n        from 
hadoopJmxMetricEventStream\n        [metric == 
\"hadoop.namenode.fsnamesystemstate.capacityused\" and value   0.9] \n        
select metric, host, value, timestamp, component, site \n        insert into 
alertStream;    Dynamical onboarding   correlation   No downtime migration and 
upgrading", 
+            "title": "Alert Engine"
+        }, 
+        {
+            "location": "/getting-started/#storage-engine", 
+            "text": "Light-weight ORM Framework for HBase/RDMBS  
@Table(\"HbaseTableName\")\n@ColumnFamily(\"ColumnFamily\")\n@Prefix(\"RowkeyPrefix\")\n@Service(\"UniqueEntitytServiceName\")\n@JsonIgnoreProperties(ignoreUnknown
 = true)\n@TimeSeries(false)\n@Indexes({\n    
@Index(name=\"Index_1_alertExecutorId\", columns = { \"alertExecutorID\" }, 
unique = true)})\npublic class AlertDefinitionAPIEntity extends 
TaggedLogAPIEntity{\n@Column(\"a\")\nprivate String desc;    Full-function 
SQL-Like REST Query   Query=UniqueEntitytServiceName[@site=\"sandbox\"]{*}    
Optimized Rowkey design for time-series data, optimized for metric/entity/log, 
etc. different storage types  Rowkey ::= Prefix | Partition Keys | timestamp | 
tagName | tagValue | \u2026    Secondary Index Support\n        @Indexes(, 
unique = true/false)})    Native HBase Coprocessor\n        
org.apache.eagle.storage.hbase.query.coprocessor.AggregateProtocolEndPoint", 
+            "title": "Storage Engine"
+        }, 
+        {
+            "location": "/getting-started/#ui-framework", 
+            "text": "Eagle UI is consist of following parts:   Eagle Main UI  
Eagle App Portal/Dashboard/Widgets  Eagle Customized Dashboard", 
+            "title": "UI Framework"
+        }, 
+        {
+            "location": "/getting-started/#application-framework", 
+            "text": "", 
+            "title": "Application Framework"
+        }, 
+        {
+            "location": "/getting-started/#application", 
+            "text": "An \"Application\" or \"App\" is composed of data 
integration, policies and insights for one data source.", 
+            "title": "Application"
+        }, 
+        {
+            "location": "/getting-started/#application-descriptor", 
+            "text": "An \"Application Descriptor\" is a static packaged 
metadata information consist of basic information like type, name, version, 
description, and application process, configuration, streams, docs, policies 
and so on.   Here is an example ApplicationDesc of  JPM_WEB_APP      {\n    
type: \"JPM_WEB_APP\",\n    name: \"Job Performance Monitoring Web \",\n    
version: \"0.5.0-incubating\",\n    description: null,\n    appClass: 
\"org.apache.eagle.app.StaticApplication\",\n    jarPath: 
\"/opt/eagle/0.5.0-incubating-SNAPSHOT-build-20161103T0332/eagle-0.5.0-incubating-SNAPSHOT/lib/eagle-topology-0.5.0-incubating-SNAPSHOT-hadoop-2.4.1-11-assembly.jar\",\n
    viewPath: \"/apps/jpm\",\n    providerClass: 
\"org.apache.eagle.app.jpm.JPMWebApplicationProvider\",\n    configuration: {\n 
       properties: [{\n            name: \"service.host\",\n            
displayName: \"Eagle Service Host\",\n            value: \"localhost\",\n       
     description: \"Eagle Service Host, de
 fault: localhost\",\n            required: false\n        }, {\n            
name: \"service.port\",\n            displayName: \"Eagle Service Port\",\n     
       value: \"8080\",\n            description: \"Eagle Service Port, 
default: 8080\",\n            required: false\n        }]\n    },\n    streams: 
null,\n    docs: null,\n    executable: false,\n    dependencies: [{\n        
type: \"MR_RUNNING_JOB_APP\",\n        version: \"0.5.0-incubating\",\n        
required: true\n    }, {\n        type: \"MR_HISTORY_JOB_APP\",\n        
version: \"0.5.0-incubating\",\n        required: true\n    }]\n    }", 
+            "title": "Application Descriptor"
+        }, 
+        {
+            "location": "/getting-started/#application-provider", 
+            "text": "Appilcation Provider is a package management and loading 
mechanism leveraging  Java SPI .  For example, in file  
META-INF/services/org.apache.eagle.app.spi.ApplicationProvider , place the full 
class name of an application provider:  
org.apache.eagle.app.jpm.JPMWebApplicationProvider", 
+            "title": "Application Provider"
+        }, 
+        {
+            "location": "/getting-started/#concepts", 
+            "text": "Here are some terms we are using in Apache Eagle 
(incubating, called Eagle in the following), please check them for your 
reference. They are basic knowledge of Eagle which also will help to well 
understand Eagle.", 
+            "title": "Concepts"
+        }, 
+        {
+            "location": "/getting-started/#site", 
+            "text": "A site can be considered as a physical data center. Big 
data platform e.g. Hadoop may be deployed to multiple data centers in an 
enterprise.", 
+            "title": "Site"
+        }, 
+        {
+            "location": "/getting-started/#application_1", 
+            "text": "An \"Application\" or \"App\" is composed of data 
integration, policies and insights for one data source.", 
+            "title": "Application"
+        }, 
+        {
+            "location": "/getting-started/#policy", 
+            "text": "A \"Policy\" defines the rule to alert. Policy can be 
simply a filter expression or a complex window based aggregation rules etc.", 
+            "title": "Policy"
+        }, 
+        {
+            "location": "/getting-started/#alerts", 
+            "text": "An \"Alert\" is an real-time event detected with certain 
alert policy or correlation logic, with different severity levels like 
INFO/WARNING/DANGER.", 
+            "title": "Alerts"
+        }, 
+        {
+            "location": "/getting-started/#data-source", 
+            "text": "A \"Data Source\" is a monitoring target data. Eagle 
supports many data sources HDFS audit logs, Hive2 query, MapReduce job etc.", 
+            "title": "Data Source"
+        }, 
+        {
+            "location": "/getting-started/#stream", 
+            "text": "A \"Stream\" is the streaming data from a data source. 
Each data source has its own stream.", 
+            "title": "Stream"
+        }, 
+        {
+            "location": "/getting-started/#quick-start", 
+            "text": "", 
+            "title": "Quick Start"
+        }, 
+        {
+            "location": "/getting-started/#deployment", 
+            "text": "", 
+            "title": "Deployment"
+        }, 
+        {
+            "location": "/getting-started/#prerequisites", 
+            "text": "Eagle requires the following dependencies:   For 
streaming platform dependencies  Storm: 0.9.3 or later  Hadoop: 2.6.x or later  
Hbase: 0.98.x or later  Kafka: 0.8.x or later  Zookeeper: 3.4.6 or later  Java: 
1.8.x    For metadata database dependencies (Choose one of them)  MangoDB 3.2.2 
or later  Installation is required    Mysql 5.1.x or later  Installation is 
required       Notice:     Storm 0.9.x does NOT support JDK8. You can replace 
asm-4.0.jar with asm-all-5.0.jar in the storm lib directory. \nThen restart 
other services(nimbus/ui/supervisor).", 
+            "title": "Prerequisites"
+        }, 
+        {
+            "location": "/getting-started/#installation", 
+            "text": "", 
+            "title": "Installation"
+        }, 
+        {
+            "location": "/getting-started/#build-eagle", 
+            "text": "Download the latest version of Eagle source code.  git 
clone https://github.com/apache/incubator-eagle.git    Build the source code, 
and a tar.gz package will be generated under eagle-server-assembly/target  mvn 
clean install -DskipTests", 
+            "title": "Build Eagle"
+        }, 
+        {
+            "location": "/getting-started/#deploy-eagle", 
+            "text": "Copy binary package to your server machine. In the 
package, you should find:  bin/ : scripts used for start eagle server  conf/ : 
default configurations for eagle server setup.  lib/  : all included software 
packages for eagle server    Change configurations under  conf/  eagle.conf  
server.yml     Run eagle-server.sh  ./bin/eagle-server.sh start    Check eagle 
server   Visit http://host:port/ in your web browser.", 
+            "title": "Deploy Eagle"
+        }, 
+        {
+            "location": "/getting-started/#setup-your-monitoring-case", 
+            "text": "Placeholder for topic: Setup Your Monitoring Case", 
+            "title": "Setup Your Monitoring Case"
+        }, 
+        {
+            "location": "/using-eagle/", 
+            "text": "Manage Eagle and Services\n\n\n\n\n\n\nAfter Apache Eagle 
has been deployed (please reference \ndeployment\n), you can enter deployment 
directory and use commands below to control Apache Eagle 
Server.\n\n\n./bin/eagle-server.sh start|stop|status\n\n\n\n\n\n\n\nAfter 
starting the Eagle server, please type http://\n:\n/ to open the web ui of 
Eagle.\n\n\n\n\n\n\n\n\nUse Eagle Web Interface\n\n\n\n\n\n\nThis is the 
typical Web Interface (short for WI) after setting up your Eagle monitoring 
environment. WI majorly contain the right main panel and left function 
menu.\n\n\n\n\n\n\n\n\nHome\n\n\n\n\n\n\nThis is the aggregated UI for 
configured sites, and the applications. It will show those created sites 
created, how many application installed for each sites, and alerts generated 
from that cluster. You can click \u201cMore info\u201d link to view the details 
for particular site.\n\n\n\n\n\n\nThe \u201c\nWidgets\n\u201d section is 
customizable; if the application develop
 er have its application registered to Home page, you can find that in 
\u201c\nWidgets\n\u201d section. Please check the application developer guide 
about how to register applications to home widgets. It give you a shortcut to 
go directly to the application home.\n\n\n\n\n\n\nAlert\n\n\n\n\nIn Alert menu, 
you can define the policies, list the policies and check your alerts there. 
\n\n\n\n\nIntegration\n\n\n\n\nThe integration page provides the management 
functionality for Eagle. You can list the built-in applications there, create 
sites, and manage the applications in your site.\n\n\n\n\nSites\n\n\n\n\nIt 
also gives you a shortcut to particular site.\n\n\n\n\n\n\nSetup The Monitoring 
Application\n\n\nMonitoring Applications\n\n\n\n\n\n\nEagle has an extensible 
framework to dynamically add new monitoring applications in Eagle environment. 
It also ships some built-in big data monitoring applications.\n\n\n\n\n\n\nGo 
to \u201c\nIntegration\n\u201d -\n \u201c\nApplications\n\u201d, it wi
 ll list a set of available monitoring applications which you can choose to 
monitor your services.\n\n\n\n\n\n\n\n\nThe \u201c\nApplication\n\u201d column 
is the display name for an application, \u201c\nStreams\n\u201d is a logical 
name for the data stream from the monitored source after pre-processing, which 
will consumed by Alert Engine.\n\n\n\n\n\n\nAt the moment, we have the below 
built-in applications shipped with Apache Eagle. You can refer to the 
application documentation to understand how to do the configuration for each 
monitoring 
application.\n\n\n\n\n\n\n\n\nApplication\n\n\nDescription\n\n\n\n\n\n\n\n\n\n\nTopology
 Health Check\n\n\nThis application can be used to monitor the service 
healthiness for HDFS, HBase and YARN. You can get alerted once the master role 
or the slave role got crashed.\n\n\n\n\n\n\nHadoop JMX Metrics 
Monitoring\n\n\nThis application can be used to monitor the JMX metrics data 
from the master nodes of HDFS, HBase and YARN, e.g. NameNode, HBase Master
  and YARN Resource Manager.\n\n\n\n\n\n\nHDFS Audit Log Monitor\n\n\nThis 
application can be used to monitor the data operations in HDFS, to detect 
sensitive data access and malicious operations; to protect from data leak or 
data loss.\n\n\n\n\n\n\nHBase Audit Log Monitor\n\n\nSame as HDFS Audit Log 
Monitor, this application is used to monitor the data operations in 
HBase.\n\n\n\n\n\n\nMap Reduce History Job\n\n\nThis application is used to get 
the MapReduce history job counters from YARN history server and job running 
history from HDFS log directory.\n\n\n\n\n\n\nMap Reduce Running Job\n\n\nThis 
application is used to get the MapReduce running job counter information using 
YARN Rest API.\n\n\n\n\n\n\nHadoop Queue Monitor\n\n\nThis application is used 
to get the resource scheduling and utilization info from YARN.\n\n\n\n\n\n\nMR 
Metrics Aggregation\n\n\nThis application is used to aggregate the job counters 
and some resource utilization in a certain period of time (daily, weekly or 
 monthly).\n\n\n\n\n\n\nJob Performance Monitor Web\n\n\nThis application only 
contains the frontend, and depends on Map Reduce History Job and Map Reduce 
Running Job.\n\n\n\n\n\n\nAlert Engine\n\n\nAlert Engine is a special 
application and used to process the output data from other 
applications.\n\n\n\n\n\n\n\n\n\n\n\n\nManaging Sites\n\n\nTo enable a real 
monitoring use case, you have to create a site first, and install a certain 
application for this site, and finally start the application. We use site 
concept to group the running applications and avoid the application 
conflict.\n\n\nSites\n\n\n\n\n\n\nGo to \u201c\nIntegration\n\u201d -\n 
\u201c\nSites\n\u201d, there will be a table listing the managed 
sites.\n\n\n\n\n\n\n\n\nCreate Site\n\n\n\n\n\n\nClick \u201c\nNew Site\n\u201d 
on the bottom right of the Sites page. You can fill the information in site 
creation dialog.\n\n\n\n\n\n\n\n\nThe \u201c\nSite Id\n\u201d should not be 
duplicated. After the creation, you can find it in 
 sites page.\n\n\n\n\n\n\n\n\nConfiguring a Site\n\n\n\n\n\n\nBy clicking 
\u201c\nEdit\n\u201d button or the Site column in Sites table, you can have the 
Site configuration page, there you can install monitoring 
applications.\n\n\n\n\n\n\n\n\nInstall and Run Applications in 
Site\n\n\n\n\n\n\nChoose the particular application which you want to install, 
you probably have something to fill, e.g. the HDFS NameNode address, Zookeeper 
address and port. Please check each application documentation for how to 
configure each application. \n\n\n\n\n\n\nAfter doing the installation, you can 
start the application by clicking \n or stop the application by \n. You can 
check the \u201c\nStatus\n\u201d column about the running status. Usually, it 
should have \u201c\nINITIALIZED\n\u201d or \u201c\nRUNNING\n\u201d for a 
healthy application.\n\n\n\n\n\n\n\n\nDefine Policies\n\n\nAfter setting up the 
monitoring applications, you probably want to setup some alert policies against 
the monitored data, so yo
 u can get notified once any violation on the data. Eagle has a centralized 
place for policy definition.\n\n\nPolicies\n\n\n\n\n\n\nGo to 
\u201c\nAlert\n\u201d -\n \u201c\nPolicies\n\u201d, you can check the policies 
defined and take control on whether to enable the policy:\n\n\n\n\n\n\n\n\nYou 
can apply the below actions for a certain policy:\n\n\n\n\n\n\n: enable a 
policy\n\n\n\n\n\n\n: disable a policy\n\n\n\n\n\n\n: edit a 
policy\n\n\n\n\n\n\n: purge a policy\n\n\n\n\n\n\n\n\n\n\nDefine or Edit 
Policies\n\n\n\n\n\n\nIf you want to create a new policy, click 
\u201c\nAlert\n\u201d -\n \u201c\nDefine Policy\n\u201d, or you can enter into 
the policy definition page by editing an existing policy. After that, you can 
go to the policy list to enable the policy dynamically.\n\n\n\n\n\n\n\n\nSource 
Stream\n\n\n\n\nThe source stream gives user a full view about what data stream 
is available for application defined for particular site, as well as the data 
structures in each data stream. Dat
 a stream name is suffixed by the site name.\n\n\n\n\nPolicy Name\n\n\n\n\nThe 
policy name should be globally unique.\n\n\n\n\nPublish Alerts\n\n\n\n\n\n\nIn 
this section, you can define the alert publishment method by clicking the 
\u201c\n+Add Publisher\n\u201d.\n\n\n\n\n\n\n\n\nYou can choose the publishment 
method from an existing policy or by creating new publisher. 
\n\n\n\n\n\n\nThere are four built-in publisher 
types:\n\n\n\n\n\n\nEmailPublisher\n: 
org.apache.eagle.alert.engine.publisher.impl.AlertEmailPublisher\n\n\n\n\n\n\nKafkaPublisher\n:
 
org.apache.eagle.alert.engine.publisher.impl.AlertKafkaPublisher\n\n\n\n\n\n\nSlackPublisher\n:
 
org.apache.eagle.alert.engine.publisher.impl.AlertSlackPublisher\n\n\n\n\n\n\nEagleStoragePlugin\n:
 
org.apache.eagle.alert.engine.publisher.impl.AlertEagleStoragePlugin\n\n\n\n\n\n\n\n\n\n\nPolicy
 Syntax\n\n\n\n\n\n\nCurrently, we support SiddhiQL(please view Siddhi Query 
Language Specification \nhere\n)\n\n\n\n\n\n\nIn order to explain how stre
 am data is processed, let us take policy below as an example:\n\n\nfrom 
map_reduce_failed_job_stream[site==\"sandbox\" and 
currentState==\"FAILED\"]\nselect * group by jobId insert into 
map_reduce_failed_job_stream_out\n\n\n\n\n\n\n\nThis policy contains below 
parts:\n\n\n\n\n\n\nSource\n: from 
map_reduce_failed_job_stream\n\n\n\n\n\n\nFilter\n: [site==\"sandbox\" and 
currentState==\"FAILED\"]\n\n\n\n\n\n\nProjection\n: select 
*\n\n\n\n\n\n\nGroupBy\n: group by jobId\n\n\n\n\n\n\nDestination\n: insert 
into map_reduce_failed_job_stream_out\n\n\n\n\n\n\n\n\n\n\nSource 
Streams(schema) are defined by applications, and applications will write stream 
data to data sink(currently, we support kafka as data sink).\n\n\nstreams\n\n   
 \nstream\n\n        \nstreamId\nmap_reduce_failed_job_stream\n/streamId\n\n    
    \ndescription\nMap Reduce Failed Job Stream\n/description\n\n        
\nvalidate\ntrue\n/validate\n\n        \ncolumns\n\n            \ncolumn\n\n    
            \nname\nsite\n/name
 \n\n                \ntype\nstring\n/type\n\n            \n/column\n\n         
   \u2026...\n            \ncolumn\n\n                \nname\njobId\n/name\n\n  
              \ntype\nstring\n/type\n\n            \ncolumn\n\n                
\nname\ncurrentState\n/name\n\n                \ntype\nstring\n/type\n\n        
    \n/column\n\n        \n/columns\n\n    
\n/stream\n\n\n/streams\n\n\n\n\n\n\n\n\nAfter policy is defined, Alert engine 
will create siddhi execution runtime for the policy(also load stream data 
schema from metadata store). Since siddhi execution runtime knows the stream 
data schema, then it will process stream data and do the 
calculation.\n\n\n\n\n\n\n\n\nMonitoring Dashboard\n\n\n\n\n\n\nAfter setting 
the sites and applications, you can find the site item from the home page or 
\u201cSites\u201d menu.\n\n\n\n\n\n\nHere is a site home example. After 
entering the site home, the left menu will be replaced by application dashboard 
links only related to that site, so you ca
 n switch between the application dashboard quickly. In the right panel, it 
contains the application icons installed in this site, but depends on if the 
application has its dashboard defined. You can click the application icon or 
the application links to go to the application dashboard home. Please check the 
application documentation about how to use the application monitoring 
dashboard.\n\n\n\n\n\n\n\n\n\n\nCheck The Alerts\n\n\n\n\n\n\nEagle has all the 
alerts generated by all the applications stored in its database, so you can 
check your application alerts from Eagle WI. \n\n\n\n\n\n\nGo to 
\u201c\nAlert\n\u201d -\n \u201c\nAlerts\n\u201d, you can find the alerts 
table.\n\n\n\n\n\n\n\n\nAlso you can check more detailed information by 
clicking \u201c\nDetail\n\u201d link for each alert 
item.\n\n\n\n\n\n\n\n\n\n\nHow to stream audit log into 
Kafka\n\n\nLogstash\n\n\nThe sample configuration is tested with 
logstash-2.3.4. Logstash is required to be installed on the namenode host.\n\n
 \n\n\n\n\nStep 1\n: Create a Kafka topic as the streaming input.\n\n\nHere is 
an sample Kafka command to create topic 'sandbox_hdfs_audit_log'\n\n\ncd 
\nkafka-home\n\nbin/kafka-topics.sh --create --zookeeper localhost:2181 
--replication-factor 1 --partitions 1 --topic 
sandbox_hdfs_audit_log\n\n\n\n\n\n\n\nStep 2\n: Create a Logstash configuration 
file under ${LOGSTASH_HOME}/conf. Here is a sample.\n\n\ninput {\n      file 
{\n          type =\n \"hdp-nn-audit\"\n          path =\n 
\"/tmp/test/hdfs-audit.log\"\n          start_position =\n end\n          
sincedb_path =\n \"/dev/null\"\n       }\n  }\n output {\n      if [type] == 
\"hdp-nn-audit\" {\n          kafka {\n            codec =\n plain {\n          
      format =\n \"%{message}\"\n            }\n            bootstrap_servers 
=\n \"host:9092\"\n            topic_id =\n \"hdfs_audit_log\"\n            
acks =\n \"0\"\n            timeout_ms =\n 10000\n\n            
send_buffer_bytes =\n 102400\n            client_id =\n \"hdp-n
 n-audit\"\n\n            workers =\n 10\n            compression_type =\n 
\"gzip\"\n         }\n          # stdout { codec =\n rubydebug }\n  
}\n}\n\n\n\n\n\n\n\nStep 4\n: Start Logstash\n\n\nbin/logstash -f 
conf/sample.conf\n\n\n\n\n\n\n\nStep 5\n: Check whether logs are flowing into 
the kafka topic specified by \ntopic_id\n\n\n\n\n\n\nFilebeat\n\n\nThe sample 
filebeat.yml is tested with filebeat-5.0.0-beta1-linux-x86_64. The throughput 
can be up to 20K messages per second. Filebeat is required to be installed on 
the namenode host.\n\n\n    filebeat.publish_async: false\n    
filebeat.spool_size: 8192\n    filebeat.idle_timeout: 5s\n    max_procs: 1\n    
queue_size: 1000\n\n    filebeat.prospectors:\n    - input_type: log\n      
paths:\n         - /tmp/test/hdfs-audit.log\n      #tail_files: true\n      
harvester_buffer_size: 8192\n\n    output.kafka:\n      enabled: true\n      
hosts: [\"host:9092\"]\n      topic: \"phx_hdfs_audit_log\"\n      client_id: 
\"client-host\"\n      work
 er: 10\n      max_retries: 3\n      bulk_max_size: 8192\n      
channel_buffer_size: 512\n      timeout: 10\n      broker_timeout: 3s\n      
keep_alive: 0\n      compression: none\n      max_message_bytes: 1000000\n      
required_acks: 0\n      flush_interval: 1\n\n    logging.metrics.period: 
10s\n\n    processors:\n      - include_fields:\n         fields: [\"message\", 
\"beat.hostname\"]\n\n\n\nLog4j Kafka Appender\n\n\nThis sample configuration 
is tested in HDP sandbox. \nRestarting namenode is required\n after updating 
the log4j configuration. \n\n\n\n\n\n\nStep 1\n: Create a Kafka topic. Here is 
an example Kafka command for creating topic \"sandbox_hdfs_audit_log\"\n\n\ncd 
\nkafka-home\n\nbin/kafka-topics.sh --create --zookeeper localhost:2181 
--replication-factor 1 --partitions 1 --topic 
sandbox_hdfs_audit_log\n\n\n\n\n\n\n\nStep 2\n: Configure 
$HADOOP_CONF_DIR/log4j.properties, and add a log4j appender 
\"KAFKA_HDFS_AUDIT\" to hdfs audit logging\n\n\nlog4j.appender.KAFKA_HDFS_A
 
UDIT=org.apache.eagle.log4j.kafka.KafkaLog4jAppender\nlog4j.appender.KAFKA_HDFS_AUDIT.Topic=sandbox_hdfs_audit_log\nlog4j.appender.KAFKA_HDFS_AUDIT.BrokerList=sandbox.hortonworks.com:6667\nlog4j.appender.KAFKA_HDFS_AUDIT.KeyClass=org.apache.eagle.log4j.kafka.hadoop.AuditLogKeyer\nlog4j.appender.KAFKA_HDFS_AUDIT.Layout=org.apache.log4j.PatternLayout\nlog4j.appender.KAFKA_HDFS_AUDIT.Layout.ConversionPattern=%d{ISO8601}
 %p %c{2}: 
%m%n\nlog4j.appender.KAFKA_HDFS_AUDIT.ProducerType=async\n#log4j.appender.KAFKA_HDFS_AUDIT.BatchSize=1\n#log4j.appender.KAFKA_HDFS_AUDIT.QueueSize=1\n\n\n\n\n\n\n\nStep
 3\n: Edit $HADOOP_CONF_DIR/hadoop-env.sh, and add the reference to 
KAFKA_HDFS_AUDIT to 
HADOOP_NAMENODE_OPTS.\n\n\n-Dhdfs.audit.logger=INFO,DRFAAUDIT,KAFKA_HDFS_AUDIT\n\n\n\n\n\n\n\nStep
 4\n: Edit $HADOOP_CONF_DIR/hadoop-env.sh, and append the following command to 
it.\n\n\nexport 
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/path/to/eagle/lib/log4jkafka/lib/*\n\n\n\n\n\n\n\nStep
 5\n: save the changes an
 d restart the namenode.\n\n\n\n\n\n\nStep 6\n: Check whether logs are flowing 
into Topic sandbox_hdfs_audit_log\n\n\n$ 
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper 
localhost:2181 --topic sandbox_hdfs_audit_log", 
+            "title": "Using Eagle"
+        }, 
+        {
+            "location": "/using-eagle/#manage-eagle-and-services", 
+            "text": "After Apache Eagle has been deployed (please reference  
deployment ), you can enter deployment directory and use commands below to 
control Apache Eagle Server.  ./bin/eagle-server.sh start|stop|status    After 
starting the Eagle server, please type http:// : / to open the web ui of 
Eagle.", 
+            "title": "Manage Eagle and Services"
+        }, 
+        {
+            "location": "/using-eagle/#use-eagle-web-interface", 
+            "text": "This is the typical Web Interface (short for WI) after 
setting up your Eagle monitoring environment. WI majorly contain the right main 
panel and left function menu.", 
+            "title": "Use Eagle Web Interface"
+        }, 
+        {
+            "location": "/using-eagle/#home", 
+            "text": "This is the aggregated UI for configured sites, and the 
applications. It will show those created sites created, how many application 
installed for each sites, and alerts generated from that cluster. You can click 
\u201cMore info\u201d link to view the details for particular site.    The 
\u201c Widgets \u201d section is customizable; if the application developer 
have its application registered to Home page, you can find that in \u201c 
Widgets \u201d section. Please check the application developer guide about how 
to register applications to home widgets. It give you a shortcut to go directly 
to the application home.", 
+            "title": "Home"
+        }, 
+        {
+            "location": "/using-eagle/#alert", 
+            "text": "In Alert menu, you can define the policies, list the 
policies and check your alerts there.", 
+            "title": "Alert"
+        }, 
+        {
+            "location": "/using-eagle/#integration", 
+            "text": "The integration page provides the management 
functionality for Eagle. You can list the built-in applications there, create 
sites, and manage the applications in your site.", 
+            "title": "Integration"
+        }, 
+        {
+            "location": "/using-eagle/#sites", 
+            "text": "It also gives you a shortcut to particular site.", 
+            "title": "Sites"
+        }, 
+        {
+            "location": "/using-eagle/#setup-the-monitoring-application", 
+            "text": "", 
+            "title": "Setup The Monitoring Application"
+        }, 
+        {
+            "location": "/using-eagle/#monitoring-applications", 
+            "text": "Eagle has an extensible framework to dynamically add new 
monitoring applications in Eagle environment. It also ships some built-in big 
data monitoring applications.    Go to \u201c Integration \u201d -  \u201c 
Applications \u201d, it will list a set of available monitoring applications 
which you can choose to monitor your services.     The \u201c Application 
\u201d column is the display name for an application, \u201c Streams \u201d is 
a logical name for the data stream from the monitored source after 
pre-processing, which will consumed by Alert Engine.    At the moment, we have 
the below built-in applications shipped with Apache Eagle. You can refer to the 
application documentation to understand how to do the configuration for each 
monitoring application.     Application  Description      Topology Health Check 
 This application can be used to monitor the service healthiness for HDFS, 
HBase and YARN. You can get alerted once the master role or the slave role got
  crashed.    Hadoop JMX Metrics Monitoring  This application can be used to 
monitor the JMX metrics data from the master nodes of HDFS, HBase and YARN, 
e.g. NameNode, HBase Master and YARN Resource Manager.    HDFS Audit Log 
Monitor  This application can be used to monitor the data operations in HDFS, 
to detect sensitive data access and malicious operations; to protect from data 
leak or data loss.    HBase Audit Log Monitor  Same as HDFS Audit Log Monitor, 
this application is used to monitor the data operations in HBase.    Map Reduce 
History Job  This application is used to get the MapReduce history job counters 
from YARN history server and job running history from HDFS log directory.    
Map Reduce Running Job  This application is used to get the MapReduce running 
job counter information using YARN Rest API.    Hadoop Queue Monitor  This 
application is used to get the resource scheduling and utilization info from 
YARN.    MR Metrics Aggregation  This application is used to aggregat
 e the job counters and some resource utilization in a certain period of time 
(daily, weekly or monthly).    Job Performance Monitor Web  This application 
only contains the frontend, and depends on Map Reduce History Job and Map 
Reduce Running Job.    Alert Engine  Alert Engine is a special application and 
used to process the output data from other applications.", 
+            "title": "Monitoring Applications"
+        }, 
+        {
+            "location": "/using-eagle/#managing-sites", 
+            "text": "To enable a real monitoring use case, you have to create 
a site first, and install a certain application for this site, and finally 
start the application. We use site concept to group the running applications 
and avoid the application conflict.", 
+            "title": "Managing Sites"
+        }, 
+        {
+            "location": "/using-eagle/#sites_1", 
+            "text": "Go to \u201c Integration \u201d -  \u201c Sites \u201d, 
there will be a table listing the managed sites.", 
+            "title": "Sites"
+        }, 
+        {
+            "location": "/using-eagle/#create-site", 
+            "text": "Click \u201c New Site \u201d on the bottom right of the 
Sites page. You can fill the information in site creation dialog.     The 
\u201c Site Id \u201d should not be duplicated. After the creation, you can 
find it in sites page.", 
+            "title": "Create Site"
+        }, 
+        {
+            "location": "/using-eagle/#configuring-a-site", 
+            "text": "By clicking \u201c Edit \u201d button or the Site column 
in Sites table, you can have the Site configuration page, there you can install 
monitoring applications.", 
+            "title": "Configuring a Site"
+        }, 
+        {
+            "location": "/using-eagle/#install-and-run-applications-in-site", 
+            "text": "Choose the particular application which you want to 
install, you probably have something to fill, e.g. the HDFS NameNode address, 
Zookeeper address and port. Please check each application documentation for how 
to configure each application.     After doing the installation, you can start 
the application by clicking   or stop the application by  . You can check the 
\u201c Status \u201d column about the running status. Usually, it should have 
\u201c INITIALIZED \u201d or \u201c RUNNING \u201d for a healthy application.", 
+            "title": "Install and Run Applications in Site"
+        }, 
+        {
+            "location": "/using-eagle/#define-policies", 
+            "text": "After setting up the monitoring applications, you 
probably want to setup some alert policies against the monitored data, so you 
can get notified once any violation on the data. Eagle has a centralized place 
for policy definition.", 
+            "title": "Define Policies"
+        }, 
+        {
+            "location": "/using-eagle/#policies", 
+            "text": "Go to \u201c Alert \u201d -  \u201c Policies \u201d, you 
can check the policies defined and take control on whether to enable the 
policy:     You can apply the below actions for a certain policy:    : enable a 
policy    : disable a policy    : edit a policy    : purge a policy", 
+            "title": "Policies"
+        }, 
+        {
+            "location": "/using-eagle/#define-or-edit-policies", 
+            "text": "If you want to create a new policy, click \u201c Alert 
\u201d -  \u201c Define Policy \u201d, or you can enter into the policy 
definition page by editing an existing policy. After that, you can go to the 
policy list to enable the policy dynamically.", 
+            "title": "Define or Edit Policies"
+        }, 
+        {
+            "location": "/using-eagle/#source-stream", 
+            "text": "The source stream gives user a full view about what data 
stream is available for application defined for particular site, as well as the 
data structures in each data stream. Data stream name is suffixed by the site 
name.", 
+            "title": "Source Stream"
+        }, 
+        {
+            "location": "/using-eagle/#policy-name", 
+            "text": "The policy name should be globally unique.", 
+            "title": "Policy Name"
+        }, 
+        {
+            "location": "/using-eagle/#publish-alerts", 
+            "text": "In this section, you can define the alert publishment 
method by clicking the \u201c +Add Publisher \u201d.     You can choose the 
publishment method from an existing policy or by creating new publisher.     
There are four built-in publisher types:    EmailPublisher : 
org.apache.eagle.alert.engine.publisher.impl.AlertEmailPublisher    
KafkaPublisher : 
org.apache.eagle.alert.engine.publisher.impl.AlertKafkaPublisher    
SlackPublisher : 
org.apache.eagle.alert.engine.publisher.impl.AlertSlackPublisher    
EagleStoragePlugin : 
org.apache.eagle.alert.engine.publisher.impl.AlertEagleStoragePlugin", 
+            "title": "Publish Alerts"
+        }, 
+        {
+            "location": "/using-eagle/#policy-syntax", 
+            "text": "Currently, we support SiddhiQL(please view Siddhi Query 
Language Specification  here )    In order to explain how stream data is 
processed, let us take policy below as an example:  from 
map_reduce_failed_job_stream[site==\"sandbox\" and 
currentState==\"FAILED\"]\nselect * group by jobId insert into 
map_reduce_failed_job_stream_out    This policy contains below parts:    Source 
: from map_reduce_failed_job_stream    Filter : [site==\"sandbox\" and 
currentState==\"FAILED\"]    Projection : select *    GroupBy : group by jobId  
  Destination : insert into map_reduce_failed_job_stream_out      Source 
Streams(schema) are defined by applications, and applications will write stream 
data to data sink(currently, we support kafka as data sink).  streams \n     
stream \n         streamId map_reduce_failed_job_stream /streamId \n         
description Map Reduce Failed Job Stream /description \n         validate true 
/validate \n         columns \n             column \n      
            name site /name \n                 type string /type \n             
/column \n            \u2026...\n             column \n                 name 
jobId /name \n                 type string /type \n             column \n       
          name currentState /name \n                 type string /type \n       
      /column \n         /columns \n     /stream  /streams     After policy is 
defined, Alert engine will create siddhi execution runtime for the policy(also 
load stream data schema from metadata store). Since siddhi execution runtime 
knows the stream data schema, then it will process stream data and do the 
calculation.", 
+            "title": "Policy Syntax"
+        }, 
+        {
+            "location": "/using-eagle/#monitoring-dashboard", 
+            "text": "After setting the sites and applications, you can find 
the site item from the home page or \u201cSites\u201d menu.    Here is a site 
home example. After entering the site home, the left menu will be replaced by 
application dashboard links only related to that site, so you can switch 
between the application dashboard quickly. In the right panel, it contains the 
application icons installed in this site, but depends on if the application has 
its dashboard defined. You can click the application icon or the application 
links to go to the application dashboard home. Please check the application 
documentation about how to use the application monitoring dashboard.", 
+            "title": "Monitoring Dashboard"
+        }, 
+        {
+            "location": "/using-eagle/#check-the-alerts", 
+            "text": "Eagle has all the alerts generated by all the 
applications stored in its database, so you can check your application alerts 
from Eagle WI.     Go to \u201c Alert \u201d -  \u201c Alerts \u201d, you can 
find the alerts table.     Also you can check more detailed information by 
clicking \u201c Detail \u201d link for each alert item.", 
+            "title": "Check The Alerts"
+        }, 
+        {
+            "location": "/using-eagle/#how-to-stream-audit-log-into-kafka", 
+            "text": "", 
+            "title": "How to stream audit log into Kafka"
+        }, 
+        {
+            "location": "/using-eagle/#logstash", 
+            "text": "The sample configuration is tested with logstash-2.3.4. 
Logstash is required to be installed on the namenode host.    Step 1 : Create a 
Kafka topic as the streaming input.  Here is an sample Kafka command to create 
topic 'sandbox_hdfs_audit_log'  cd  kafka-home \nbin/kafka-topics.sh --create 
--zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic 
sandbox_hdfs_audit_log    Step 2 : Create a Logstash configuration file under 
${LOGSTASH_HOME}/conf. Here is a sample.  input {\n      file {\n          type 
=  \"hdp-nn-audit\"\n          path =  \"/tmp/test/hdfs-audit.log\"\n          
start_position =  end\n          sincedb_path =  \"/dev/null\"\n       }\n  }\n 
output {\n      if [type] == \"hdp-nn-audit\" {\n          kafka {\n            
codec =  plain {\n                format =  \"%{message}\"\n            }\n     
       bootstrap_servers =  \"host:9092\"\n            topic_id =  
\"hdfs_audit_log\"\n            acks =  \"0\"\n            timeout
 _ms =  10000\n\n            send_buffer_bytes =  102400\n            client_id 
=  \"hdp-nn-audit\"\n\n            workers =  10\n            compression_type 
=  \"gzip\"\n         }\n          # stdout { codec =  rubydebug }\n  }\n}    
Step 4 : Start Logstash  bin/logstash -f conf/sample.conf    Step 5 : Check 
whether logs are flowing into the kafka topic specified by  topic_id", 
+            "title": "Logstash"
+        }, 
+        {
+            "location": "/using-eagle/#filebeat", 
+            "text": "The sample filebeat.yml is tested with 
filebeat-5.0.0-beta1-linux-x86_64. The throughput can be up to 20K messages per 
second. Filebeat is required to be installed on the namenode host.      
filebeat.publish_async: false\n    filebeat.spool_size: 8192\n    
filebeat.idle_timeout: 5s\n    max_procs: 1\n    queue_size: 1000\n\n    
filebeat.prospectors:\n    - input_type: log\n      paths:\n         - 
/tmp/test/hdfs-audit.log\n      #tail_files: true\n      harvester_buffer_size: 
8192\n\n    output.kafka:\n      enabled: true\n      hosts: [\"host:9092\"]\n  
    topic: \"phx_hdfs_audit_log\"\n      client_id: \"client-host\"\n      
worker: 10\n      max_retries: 3\n      bulk_max_size: 8192\n      
channel_buffer_size: 512\n      timeout: 10\n      broker_timeout: 3s\n      
keep_alive: 0\n      compression: none\n      max_message_bytes: 1000000\n      
required_acks: 0\n      flush_interval: 1\n\n    logging.metrics.period: 
10s\n\n    processors:\n      - include_fie
 lds:\n         fields: [\"message\", \"beat.hostname\"]", 
+            "title": "Filebeat"
+        }, 
+        {
+            "location": "/using-eagle/#log4j-kafka-appender", 
+            "text": "This sample configuration is tested in HDP sandbox.  
Restarting namenode is required  after updating the log4j configuration.     
Step 1 : Create a Kafka topic. Here is an example Kafka command for creating 
topic \"sandbox_hdfs_audit_log\"  cd  kafka-home \nbin/kafka-topics.sh --create 
--zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic 
sandbox_hdfs_audit_log    Step 2 : Configure $HADOOP_CONF_DIR/log4j.properties, 
and add a log4j appender \"KAFKA_HDFS_AUDIT\" to hdfs audit logging  
log4j.appender.KAFKA_HDFS_AUDIT=org.apache.eagle.log4j.kafka.KafkaLog4jAppender\nlog4j.appender.KAFKA_HDFS_AUDIT.Topic=sandbox_hdfs_audit_log\nlog4j.appender.KAFKA_HDFS_AUDIT.BrokerList=sandbox.hortonworks.com:6667\nlog4j.appender.KAFKA_HDFS_AUDIT.KeyClass=org.apache.eagle.log4j.kafka.hadoop.AuditLogKeyer\nlog4j.appender.KAFKA_HDFS_AUDIT.Layout=org.apache.log4j.PatternLayout\nlog4j.appender.KAFKA_HDFS_AUDIT.Layout.ConversionPattern=%d{ISO8601}
 %p %c{2}: %m%n\nlog
 
4j.appender.KAFKA_HDFS_AUDIT.ProducerType=async\n#log4j.appender.KAFKA_HDFS_AUDIT.BatchSize=1\n#log4j.appender.KAFKA_HDFS_AUDIT.QueueSize=1
    Step 3 : Edit $HADOOP_CONF_DIR/hadoop-env.sh, and add the reference to 
KAFKA_HDFS_AUDIT to HADOOP_NAMENODE_OPTS.  
-Dhdfs.audit.logger=INFO,DRFAAUDIT,KAFKA_HDFS_AUDIT    Step 4 : Edit 
$HADOOP_CONF_DIR/hadoop-env.sh, and append the following command to it.  export 
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/path/to/eagle/lib/log4jkafka/lib/*    
Step 5 : save the changes and restart the namenode.    Step 6 : Check whether 
logs are flowing into Topic sandbox_hdfs_audit_log  $ 
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper 
localhost:2181 --topic sandbox_hdfs_audit_log", 
+            "title": "Log4j Kafka Appender"
+        }, 
+        {
+            "location": "/applications/", 
+            "text": "HDFS Data Activity Monitoring\n\n\nMonitor 
Requirements\n\n\nThis application aims to monitor user activities on HDFS via 
the hdfs audit log. Once any abnormal user activity is detected, an alert is 
sent in several seconds. The whole pipeline of this application 
is\n\n\n\n\n\n\nKafka ingest: this application consumes data from Kafka. In 
other words, users have to stream the log into Kafka first. \n\n\n\n\n\n\nData 
re-procesing, which includes raw log parser, ip zone joiner, sensitivity 
information joiner. \n\n\n\n\n\n\nKafka sink: parsed data will flows into Kafka 
again, which will be consumed by the alert engine. \n\n\n\n\n\n\nPolicy 
evaluation: the alert engine (hosted in Alert Engine app) evaluates each data 
event to check if the data violate the user defined policy. An alert is 
generated if the data matches the policy.\n\n\n\n\n\n\n\n\nSetup \n 
Installation\n\n\n\n\n\n\nChoose a site to install this application. For 
example 'sandbox'\n\n\n\n\n\n\nInstall \"H
 dfs Audit Log Monitor\" app step by step\n\n\n\n\n\n\n\n\n\n\n\n\nHow to 
collect the log\n\n\nTo collect the raw audit log on namenode servers, a log 
collector is needed. Users can choose any tools they like. There are some 
common solutions available: \nlogstash\n, \nfilebeat\n, log4j appender, etcs. 
\n\n\nFor detailed instruction, refer to: \nHow to stream audit log into 
Kafka\n\n\nSample policies\n\n\n1. monitor file/folder operations\n\n\nDelete a 
file/folder on HDFS. \n\n\nfrom 
HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[str:contains(src,'/tmp/test/subtest') 
and ((cmd=='rename' and str:contains(dst, '.Trash')) or cmd=='delete')] select 
* group by user insert into 
hdfs_audit_log_enriched_stream_out\n\n\n\n\nHDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX
 is the input stream name, and hdfs_audit_log_enriched_stream_out is the output 
stream name, the content between [] is the monitoring conditions. \ncmd\n, 
\nsrc\n and \ndst\n is the fields of hdfs audit logs.\n\n\n\n\n2. classify the 
file/folde
 r on HDFS\n\n\nUsers may want to mark some folders/files on HDFS as sensitive 
content. For example, by marking '/sys/soj' as \"SOJ\", users can monitor any 
operations they care about on 'sys/soj' and its subfolders/files.\n\n\nfrom 
HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[sensitivityType=='SOJ' and 
cmd=='delete')] select * group by user insert into 
hdfs_audit_log_enriched_stream_out\n\n\n\n\nThe example policy monitors the 
'delete' operation on files/subfolders under /sys/soj. \n\n\n3. Classify the IP 
Zone\n\n\nIn some cases, the ips are classified into different zones. For some 
zone, it may have higher secrecy. Eagle providers ways to monitor user 
activities on IP level. \n\n\nfrom 
HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[securityZone=='SECURITY' and 
cmd=='delete')] select * group by user insert into 
hdfs_audit_log_enriched_stream_out\n\n\n\n\nThe example policy monitors the 
'delete' operation on hosts in 'SECURITY' zone. \n\n\nQuestions on this 
application\n\n\n\n\nJMX Monitoring\n\n\
 n\n\n\n\nApplication \"\nHADOOP_JMX_METRIC_MONITOR\n\" provide embedded 
collector script to ingest hadoop/hbase jmx metric as eagle stream and provide 
ability to define alert policy and detect anomaly in real-time from 
metric.\n\n\n\n\n\n\n\n\nFields\n\n\n\n\n\n\n\n\n\n\n\n\nType\n\n\nHADOOP_JMX_METRIC_MONITOR\n\n\n\n\n\n\nVersion\n\n\n0.5.0-version\n\n\n\n\n\n\nDescription\n\n\nCollect
 JMX Metric and monitor in 
real-time\n\n\n\n\n\n\nStreams\n\n\nHADOOP_JMX_METRIC_STREAM\n\n\n\n\n\n\nConfiguration\n\n\nJMX
 Metric Kafka Topic (default: hadoop_jmx_metric_{SITE_ID})\nKafka Broker List 
(default: localhost:6667)\n\n\n\n\n\n\n\n\n\n\n\n\nSetup \n 
Installation\n\n\n\n\n\n\nMake sure already setup a site (here use a demo site 
named \"sandbox\").\n\n\n\n\n\n\nInstall \"Hadoop JMX Monitor\" app in eagle 
server.\n\n\n\n\n\n\n\n\nConfigure Application 
settings.\n\n\n\n\n\n\n\n\nEnsure a kafka topic named 
hadoop_jmx_metric_{SITE_ID} (In current guide, it should be 
hadoop_jmx_metric_sandbox)\n\n
 \n\n\n\n\nSetup metric collector for monitored Hadoop/HBase using 
hadoop_jmx_collector and modify the configuration.\n\n\n\n\n\n\nCollector 
scripts: \nhadoop_jmx_collector\n\n\n\n\n\n\nRename config-sample.json to 
config.json: \nconfig-sample.json\n\n\n{\n    env: {\n        site: 
\"sandbox\",\n        name_node: {\n            hosts: [\n                
\"sandbox.hortonworks.com\"\n            ],\n            port: 50070,\n         
   https: false\n        },\n        resource_manager: {\n            hosts: 
[\n                \"sandbox.hortonworks.com\"\n            ],\n            
port: 50030,\n            https: false\n        }\n    },\n    inputs: [{\n     
   component: \"namenode\",\n        host: \"server.eagle.apache.org\",\n       
 port: \"50070\",\n        https: false,\n        kafka_topic: 
\"nn_jmx_metric_sandbox\"\n    }, {\n        component: \"resourcemanager\",\n  
      host: \"server.eagle.apache.org\",\n        port: \"8088\",\n        
https: false,\n        kafka_t
 opic: \"rm_jmx_metric_sandbox\"\n    }, {\n        component: \"datanode\",\n  
      host: \"server.eagle.apache.org\",\n        port: \"50075\",\n        
https: false,\n        kafka_topic: \"dn_jmx_metric_sandbox\"\n    }],\n    
filter: {\n        monitoring.group.selected: [\n            \"hadoop\",\n      
      \"java.lang\"\n        ]\n    },\n    output: {\n        kafka: {\n       
     brokerList: [\n                \"localhost:9092\"\n            ]\n        
}\n    }\n}\n\n\n\n\n\n\n\n\n\n\n\nClick \"Install\" button then you will see 
the HADOOP_JMX_METRIC_STREAM_{SITE_ID} in Streams.\n\n\n\n\n\n\n\n\nDefine JMX 
Alert Policy\n\n\n\n\n\n\nGo to \"Define Policy\".\n\n\n\n\n\n\nSelect 
HADOOP_JMX_METRIC_MONITOR related streams.\n\n\n\n\n\n\nDefine SQL-Like policy, 
for example\n\n\nfrom HADOOP_JMX_METRIC_STREAM_SANDBOX[metric==\"cpu.usage\" 
and value \n 0.9]\nselect site,host,component,value\ninsert into 
HADOOP_CPU_USAGE_GT_90_ALERT;\n\n\n\nAs seen in below screenshot:\n\n\n\n\n\n
 \n\n\nStream Schema\n\n\n\n\n\n\nSchema\n\n\n\n\n\n\n\n\nStream 
Name\n\n\nStream Schema\n\n\nTime 
Series\n\n\n\n\n\n\n\n\n\n\nHADOOP_JMX_METRIC_MONITOR\n\n\nhost\n: 
STRING\ntimestamp\n: LONG\nmetric\n: STRING\ncomponent\n: STRING\nsite\n: 
STRING\nvalue\n: DOUBLE\n\n\nTrue\n\n\n\n\n\n\n\n\n\n\n\n\nMetrics 
List\n\n\n\n\nPlease refer to the \nHadoop JMX Metrics List\n and see which 
metrics you're interested in.\n\n\n\n\n\n\nJob Performance 
Monitoring\n\n\nMonitor Requirements\n\n\n\n\nFinished/Running Job 
Details\n\n\nJob Metrics(Job Counter/Statistics) Aggregation\n\n\nAlerts(Job 
failure/Job slow)\n\n\n\n\nApplications\n\n\n\n\n\n\nApplication 
Table\n\n\n\n\n\n\n\n\napplication\n\n\nresponsibility\n\n\n\n\n\n\n\n\n\n\nMap 
Reduce History Job Monitoring\n\n\nparse mr history job logs from 
hdfs\n\n\n\n\n\n\nMap Reduce Running Job Monitoring\n\n\nget mr running job 
details from resource manager\n\n\n\n\n\n\nMap Reduce Metrics 
Aggregation\n\n\naggregate metrics generated by applications ab
 ove\n\n\n\n\n\n\n\n\n\n\n\n\nData Ingestion And Process\n\n\n\n\n\n\nWe build 
storm topology to fulfill requirements for each 
application.\n\n\n\n\n\n\n\n\nMap Reduce History Job Monitoring (Figure 
1)\n\n\n\n\nRead Spout\n\n\nread/parse history job logs from HDFS and flush to 
eagle service(storage is Hbase)\n\n\n\n\n\n\nSink Bolt\n\n\nconvert parsed jobs 
to streams and write to data sink\n\n\n\n\n\n\n\n\n\n\nMap Reduce Running Job 
Monitoring (Figure 2)\n\n\nRead Spout\n\n\nfetch running job list from resource 
manager and emit to Parse Bolt\n\n\n\n\n\n\nParse Bolt\n\n\nfor each running 
job, fetch job detail/job counter/job configure/tasks from resource 
manager\n\n\n\n\n\n\n\n\n\n\nMap Reduce Metrics Aggregation (Figure 
3)\n\n\nDivide Spout\n\n\ndivide time period(need to be aggregated) to small 
pieces and emit to Aggregate Bolt\n\n\n\n\n\n\nAggregate Bolt\n\n\naggregate 
metrics for given time period received from Divide 
Spout\n\n\n\n\n\n\n\n\n\n\n\n\nSetup \n Installation\n\n\n\n\n\n
 \nMake sure already setup a site (here use a demo site named 
\"sandbox\").\n\n\n\n\n\n\nInstall \"Map Reduce History Job\" app in eagle 
server(Take this application as an example).\n\n\n\n\n\n\nConfigure Application 
settings\n\n\n\n\n\n\n\n\nEnsure a kafka topic named 
{SITE_ID}_map_reduce_failed_job (In current guide, it should be 
sandbox_map_reduce_failed_job) will be created.\n\n\n\n\n\n\nClick \"Install\" 
button then you will see the MAP_REDUCE_FAILED_JOB_STREAM_{SITE_ID} in 
Alert-\nStreams.\n    \n\n  This application will write stream data to kafka 
topic(created by last step)\n\n\n\n\n\n\nIntegration With Alert Engine\n\n\nIn 
order to integrate applications with alert engine and send alerts, follow below 
steps(Take Map Reduce History Job application as an 
example):\n\n\n\n\n\n\ndefine stream and configure data sink\n\n\n\n\ndefine 
stream in resource/META-INF/providers/xxxProviders.xml\nFor example, 
MAP_REDUCE_FAILED_JOB_STREAM_{SITE_ID}\n\n\nconfigure data sink\nFor example, cr
 eate kafka topic {SITE_ID}_map_reduce_failed_job\n\n\n\n\n\n\n\n\ndefine 
policy\n\n\n\n\n\n\nFor example, if you want to receive map reduce job failure 
alerts, you can define policies (SiddhiQL) as the following:\n\n\nfrom 
map_reduce_failed_job_stream[site==\nsandbox\n and 
currentState==\nFAILED\n]\nselect site, queue, user, jobType, jobId, 
submissionTime, trackingUrl, startTime, endTime\ngroup by jobId insert into 
map_reduce_failed_job_stream_out\n\n\n\n\n\n\n\n\nview alerts\n\n\n\n\nYou can 
view alerts in Alert-\nalerts page.\n\n\nStream Schema\n\n\nAll columns above 
are predefined in stream map_reduce_failed_job_stream defined 
in\n\n\neagle-jpm/eagle-jpm-mr-history/src/main/resources/META-INF/providers/org.apache.eagle.jpm.mr.history.MRHistoryJobApplicationProvider.xml\n\n\n\nThen,
 enable the policy in web ui after it's created. Eagle will schedule it 
automatically.\n\n\n\n\nTopology Health Check\n\n\n\n\n\n\nApplication 
\"TOPOLOGY HEALTH CHECK\" aims to monior those servies with
  a master-slave structured topology and provide metrics at host 
level.\n\n\n\n\n\n\n\n\nFields\n\n\n\n\n\n\n\n\n\n\n\n\nType\n\n\nTOPOLOGY_HEALTH_CHECK\n\n\n\n\n\n\nVersion\n\n\n0.5.0-version\n\n\n\n\n\n\nDescription\n\n\nCollect
 MR,HBASE,HDFS node status and cluster 
ratio\n\n\n\n\n\n\nStreams\n\n\nTOPOLOGY_HEALTH_CHECK_STREAM\n\n\n\n\n\n\nConfiguration\n\n\nTopology
 Health Check Topic (default: topology_health_check)\nKafka Broker List 
(default: sandobox.hortonworks.com:6667)\n\n\n\n\n\n\n\n\n\n\n\n\nSetup \n 
Installation\n\n\n\n\n\n\nMake sure already setup a site (here use a demo site 
named \"sandbox\").\n\n\n\n\n\n\nInstall \"Topology Health Check\" app in eagle 
server.\n\n\n\n\n\n\n\n\nConfigure Application 
settings.\n\n\n\n\n\n\n\n\nEnsure the existence of a kafka topic named 
topology_health_check (In current guide, it should be 
topology_health_check).\n\n\n\n\n\n\nClick \"Install\" button then you will see 
the TOPOLOGY_HEALTH_CHECK_STREAM_{SITE_ID} on \"Streams\" page (Stream
 s could be navigated in left-nav).\n\n\n\n\n\n\n\n\nDefine Health Check Alert 
Policy\n\n\n\n\n\n\nGo to \"Define Policy\".\n\n\n\n\n\n\nSelect 
TOPOLOGY_HEALTH_CHECK related streams.\n\n\n\n\n\n\nDefine SQL-Like policy, for 
example\n\n\nfrom TOPOLOGY_HEALTH_CHECK_STREAM_SANDBOX[status=='dead'] select * 
insert into topology_health_check_stream_out;\n\n\n\n\n\n\n\n\n\n\n\nHadoop 
Queue Monitoring\n\n\n\n\n\n\nThis application collects metrics of Resource 
Manager in the following aspects:\n\n\n\n\n\n\nScheduler Info of the cluster: 
http://{RM_HTTP_ADDRESS}:{PORT}/ws/v1/cluster/scheduler\n\n\n\n\n\n\nApplications
 of the cluster: 
http://{RM_HTTP_ADDRESS}:{PORT}/ws/v1/cluster/apps\n\n\n\n\n\n\nOverall metrics 
of the cluster: http://{RM_HTTP_ADDRESS}:{PORT}/ws/v1/cluster/metrics\n\n\nby 
version 0.5-incubating, mainly focusing at metrics\n - `appsPending`\n - 
`allocatedMB`\n - `totalMB`\n - `availableMB`\n - `reservedMB`\n - 
`allocatedVirtualCores`.\n\n\n\n\n\n\n\n\n\n\n\nSetup \n Installatio
 n\n\n\n\n\n\n\nMake sure already setup a site (here use a demo site named 
\"sandbox\").\n\n\n\n\n\n\nFrom left-nav list, navigate to application managing 
page by \"\nIntegration\n\" \n \"\nSites\n\", and hit link \"\nsandbox\n\" on 
right.\n\n\n\n\n\n\n\n\nInstall \"Hadoop Queue Monitor\" by clicking 
\"install\" button of the application.\n\n\n\n\n\n\n\n\nIn the pop-up layout, 
select running mode as \nLocal\n or \nCluster\n.\n\n\n\n\n\n\n\n\nSet the 
target jar of eagle's topology assembly that has existed in eagle server, 
indicating the absolute path ot it. As in the following 
screenshot:\n\n\n\n\n\n\n\n\nSet Resource Manager endpoint urls field, separate 
values with comma if there are more than 1 url (e.g. a secondary node for 
HA).\n\n\n\n\n\n\n\n\nSet fields \"\nStorm Worker Number\n\", \"\nParallel 
Tasks Per Bolt\n\", and \"\nFetching Metric Interval in Seconds\n\", or leave 
them as default if they fit your needs.\n\n\n\n\n\n\n\n\nFinally, hit 
\"\nInstall\n\" button to complete it
 .\n\n\n\n\n\n\nUse of the application\n\n\n\n\n\n\nThere is no need to define 
policies for this applicatoin to work, it could be integrated with \"\nJob 
Performance Monitoring Web\n\" application and consequently seen on cluster 
dashboard, as long as the latter application is installed too. See an exmple in 
the following screenshot:", 
+            "title": "Applications"
+        }, 
+        {
+            "location": "/applications/#hdfs-data-activity-monitoring", 
+            "text": "", 
+            "title": "HDFS Data Activity Monitoring"
+        }, 
+        {
+            "location": "/applications/#monitor-requirements", 
+            "text": "This application aims to monitor user activities on HDFS 
via the hdfs audit log. Once any abnormal user activity is detected, an alert 
is sent in several seconds. The whole pipeline of this application is    Kafka 
ingest: this application consumes data from Kafka. In other words, users have 
to stream the log into Kafka first.     Data re-procesing, which includes raw 
log parser, ip zone joiner, sensitivity information joiner.     Kafka sink: 
parsed data will flows into Kafka again, which will be consumed by the alert 
engine.     Policy evaluation: the alert engine (hosted in Alert Engine app) 
evaluates each data event to check if the data violate the user defined policy. 
An alert is generated if the data matches the policy.", 
+            "title": "Monitor Requirements"
+        }, 
+        {
+            "location": "/applications/#setup-installation", 
+            "text": "Choose a site to install this application. For example 
'sandbox'    Install \"Hdfs Audit Log Monitor\" app step by step", 
+            "title": "Setup &amp; Installation"
+        }, 
+        {
+            "location": "/applications/#how-to-collect-the-log", 
+            "text": "To collect the raw audit log on namenode servers, a log 
collector is needed. Users can choose any tools they like. There are some 
common solutions available:  logstash ,  filebeat , log4j appender, etcs.   For 
detailed instruction, refer to:  How to stream audit log into Kafka", 
+            "title": "How to collect the log"
+        }, 
+        {
+            "location": "/applications/#sample-policies", 
+            "text": "", 
+            "title": "Sample policies"
+        }, 
+        {
+            "location": "/applications/#1-monitor-filefolder-operations", 
+            "text": "Delete a file/folder on HDFS.   from 
HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[str:contains(src,'/tmp/test/subtest') 
and ((cmd=='rename' and str:contains(dst, '.Trash')) or cmd=='delete')] select 
* group by user insert into hdfs_audit_log_enriched_stream_out  
HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX is the input stream name, and 
hdfs_audit_log_enriched_stream_out is the output stream name, the content 
between [] is the monitoring conditions.  cmd ,  src  and  dst  is the fields 
of hdfs audit logs.", 
+            "title": "1. monitor file/folder operations"
+        }, 
+        {
+            "location": "/applications/#2-classify-the-filefolder-on-hdfs", 
+            "text": "Users may want to mark some folders/files on HDFS as 
sensitive content. For example, by marking '/sys/soj' as \"SOJ\", users can 
monitor any operations they care about on 'sys/soj' and its subfolders/files.  
from HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[sensitivityType=='SOJ' and 
cmd=='delete')] select * group by user insert into 
hdfs_audit_log_enriched_stream_out  The example policy monitors the 'delete' 
operation on files/subfolders under /sys/soj.", 
+            "title": "2. classify the file/folder on HDFS"
+        }, 
+        {
+            "location": "/applications/#3-classify-the-ip-zone", 
+            "text": "In some cases, the ips are classified into different 
zones. For some zone, it may have higher secrecy. Eagle providers ways to 
monitor user activities on IP level.   from 
HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[securityZone=='SECURITY' and 
cmd=='delete')] select * group by user insert into 
hdfs_audit_log_enriched_stream_out  The example policy monitors the 'delete' 
operation on hosts in 'SECURITY' zone.", 
+            "title": "3. Classify the IP Zone"
+        }, 
+        {
+            "location": "/applications/#questions-on-this-application", 
+            "text": "", 
+            "title": "Questions on this application"
+        }, 
+        {
+            "location": "/applications/#jmx-monitoring", 
+            "text": "Application \" HADOOP_JMX_METRIC_MONITOR \" provide 
embedded collector script to ingest hadoop/hbase jmx metric as eagle stream and 
provide ability to define alert policy and detect anomaly in real-time from 
metric.     Fields       Type  HADOOP_JMX_METRIC_MONITOR    Version  
0.5.0-version    Description  Collect JMX Metric and monitor in real-time    
Streams  HADOOP_JMX_METRIC_STREAM    Configuration  JMX Metric Kafka Topic 
(default: hadoop_jmx_metric_{SITE_ID}) Kafka Broker List (default: 
localhost:6667)", 
+            "title": "JMX Monitoring"
+        }, 
+        {
+            "location": "/applications/#setup-installation_1", 
+            "text": "Make sure already setup a site (here use a demo site 
named \"sandbox\").    Install \"Hadoop JMX Monitor\" app in eagle server.     
Configure Application settings.     Ensure a kafka topic named 
hadoop_jmx_metric_{SITE_ID} (In current guide, it should be 
hadoop_jmx_metric_sandbox)    Setup metric collector for monitored Hadoop/HBase 
using hadoop_jmx_collector and modify the configuration.    Collector scripts:  
hadoop_jmx_collector    Rename config-sample.json to config.json:  
config-sample.json  {\n    env: {\n        site: \"sandbox\",\n        
name_node: {\n            hosts: [\n                
\"sandbox.hortonworks.com\"\n            ],\n            port: 50070,\n         
   https: false\n        },\n        resource_manager: {\n            hosts: 
[\n                \"sandbox.hortonworks.com\"\n            ],\n            
port: 50030,\n            https: false\n        }\n    },\n    inputs: [{\n     
   component: \"namenode\",\n        host: \"server.eagle.
 apache.org\",\n        port: \"50070\",\n        https: false,\n        
kafka_topic: \"nn_jmx_metric_sandbox\"\n    }, {\n        component: 
\"resourcemanager\",\n        host: \"server.eagle.apache.org\",\n        port: 
\"8088\",\n        https: false,\n        kafka_topic: 
\"rm_jmx_metric_sandbox\"\n    }, {\n        component: \"datanode\",\n        
host: \"server.eagle.apache.org\",\n        port: \"50075\",\n        https: 
false,\n        kafka_topic: \"dn_jmx_metric_sandbox\"\n    }],\n    filter: 
{\n        monitoring.group.selected: [\n            \"hadoop\",\n            
\"java.lang\"\n        ]\n    },\n    output: {\n        kafka: {\n            
brokerList: [\n                \"localhost:9092\"\n            ]\n        }\n   
 }\n}      Click \"Install\" button then you will see the 
HADOOP_JMX_METRIC_STREAM_{SITE_ID} in Streams.", 
+            "title": "Setup &amp; Installation"
+        }, 
+        {
+            "location": "/applications/#define-jmx-alert-policy", 
+            "text": "Go to \"Define Policy\".    Select 
HADOOP_JMX_METRIC_MONITOR related streams.    Define SQL-Like policy, for 
example  from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric==\"cpu.usage\" and value  
 0.9]\nselect site,host,component,value\ninsert into 
HADOOP_CPU_USAGE_GT_90_ALERT;  As seen in below screenshot:", 
+            "title": "Define JMX Alert Policy"
+        }, 
+        {
+            "location": "/applications/#stream-schema", 
+            "text": "Schema     Stream Name  Stream Schema  Time Series      
HADOOP_JMX_METRIC_MONITOR  host : STRING timestamp : LONG metric : STRING 
component : STRING site : STRING value : DOUBLE  True", 
+            "title": "Stream Schema"
+        }, 
+        {
+            "location": "/applications/#metrics-list", 
+            "text": "Please refer to the  Hadoop JMX Metrics List  and see 
which metrics you're interested in.", 
+            "title": "Metrics List"
+        }, 
+        {
+            "location": "/applications/#job-performance-monitoring", 
+            "text": "", 
+            "title": "Job Performance Monitoring"
+        }, 
+        {
+            "location": "/applications/#monitor-requirements_1", 
+            "text": "Finished/Running Job Details  Job Metrics(Job 
Counter/Statistics) Aggregation  Alerts(Job failure/Job slow)", 
+            "title": "Monitor Requirements"
+        }, 
+        {
+            "location": "/applications/#applications", 
+            "text": "Application Table     application  responsibility      
Map Reduce History Job Monitoring  parse mr history job logs from hdfs    Map 
Reduce Running Job Monitoring  get mr running job details from resource manager 
   Map Reduce Metrics Aggregation  aggregate metrics generated by applications 
above", 
+            "title": "Applications"
+        }, 
+        {
+            "location": "/applications/#data-ingestion-and-process", 
+            "text": "We build storm topology to fulfill requirements for each 
application.     Map Reduce History Job Monitoring (Figure 1)   Read Spout  
read/parse history job logs from HDFS and flush to eagle service(storage is 
Hbase)    Sink Bolt  convert parsed jobs to streams and write to data sink      
Map Reduce Running Job Monitoring (Figure 2)  Read Spout  fetch running job 
list from resource manager and emit to Parse Bolt    Parse Bolt  for each 
running job, fetch job detail/job counter/job configure/tasks from resource 
manager      Map Reduce Metrics Aggregation (Figure 3)  Divide Spout  divide 
time period(need to be aggregated) to small pieces and emit to Aggregate Bolt   
 Aggregate Bolt  aggregate metrics for given time period received from Divide 
Spout", 
+            "title": "Data Ingestion And Process"
+        }, 
+        {
+            "location": "/applications/#setup-installation_2", 
+            "text": "Make sure already setup a site (here use a demo site 
named \"sandbox\").    Install \"Map Reduce History Job\" app in eagle 
server(Take this application as an example).    Configure Application settings  
   Ensure a kafka topic named {SITE_ID}_map_reduce_failed_job (In current 
guide, it should be sandbox_map_reduce_failed_job) will be created.    Click 
\"Install\" button then you will see the MAP_REDUCE_FAILED_JOB_STREAM_{SITE_ID} 
in Alert- Streams.\n     \n  This application will write stream data to kafka 
topic(created by last step)", 
+            "title": "Setup &amp; Installation"
+        }, 
+        {
+            "location": "/applications/#integration-with-alert-engine", 
+            "text": "In order to integrate applications with alert engine and 
send alerts, follow below steps(Take Map Reduce History Job application as an 
example):    define stream and configure data sink   define stream in 
resource/META-INF/providers/xxxProviders.xml\nFor example, 
MAP_REDUCE_FAILED_JOB_STREAM_{SITE_ID}  configure data sink\nFor example, 
create kafka topic {SITE_ID}_map_reduce_failed_job     define policy    For 
example, if you want to receive map reduce job failure alerts, you can define 
policies (SiddhiQL) as the following:  from map_reduce_failed_job_stream[site== 
sandbox  and currentState== FAILED ]\nselect site, queue, user, jobType, jobId, 
submissionTime, trackingUrl, startTime, endTime\ngroup by jobId insert into 
map_reduce_failed_job_stream_out    view alerts   You can view alerts in Alert- 
alerts page.", 
+            "title": "Integration With Alert Engine"
+        }, 
+        {
+            "location": "/applications/#stream-schema_1",


[... 247 lines stripped ...]
Propchange: eagle/site/docs/v0.5.0/mkdocs/search_index.json
------------------------------------------------------------------------------
    svn:eol-style = native

svn commit: r1789961 [4/5] - in /eagle/site/docs: ./ latest/ latest/include/images/ latest/mkdocs/ latest/mkdocs/js/ v0.5.0/ v0.5.0/include/images/ v0.5.0/mkdocs/ v0.5.0/mkdocs/js/

Reply via email to