[43/50] [abbrv] incubator-apex-core git commit: APEXCORE-293 Adding Apex Core documentation

davidyan Fri, 04 Mar 2016 11:49:09 -0800

APEXCORE-293 Adding Apex Core documentation


Project: http://git-wip-us.apache.org/repos/asf/incubator-apex-core/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-apex-core/commit/cfdacdf2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-apex-core/tree/cfdacdf2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-apex-core/diff/cfdacdf2

Branch: refs/heads/master
Commit: cfdacdf295ea1ce198dc17a07582b9441785eb9d
Parents: 7af835b
Author: Sasha Parfenov <[email protected]>
Authored: Fri Feb 26 17:12:29 2016 -0800
Committer: Thomas Weise <[email protected]>
Committed: Sun Feb 28 22:49:38 2016 -0800

----------------------------------------------------------------------
 README.md                              |   2 +
 docs/README.md                         |  34 +++++++
 docs/apex.md                           |  14 ---
 docs/apex_development_setup.md         | 117 +++++++++-------------
 docs/apex_malhar.md                    |  64 ++++++------
 docs/application_development.md        | 140 ++++++--------------------
 docs/application_packages.md           |   8 +-
 docs/autometrics.md                    | 150 ++--------------------------
 docs/dtcli.md                          |   9 +-
 docs/favicon.ico                       | Bin 0 -> 25597 bytes
 docs/images/MalharOperatorOverview.png | Bin 297948 -> 0 bytes
 docs/index.md                          |  20 ++++
 docs/operator_development.md           |   8 +-
 mkdocs.yml                             |  15 +++
 14 files changed, 197 insertions(+), 384 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index 4b22f71..97c2b03 100644
--- a/README.md
+++ b/README.md
@@ -11,6 +11,8 @@ Please visit the [documentation 
section](http://apex.incubator.apache.org/docs.h
 
 [Malhar](https://github.com/apache/incubator-apex-malhar) is a library of 
application building blocks and examples that will help you build out your 
first Apex application quickly.
 
+Documentation build and hosting process is explained in [docs/README.md].
+
 ##Contributing
 
 This project welcomes new contributors.  If you would like to help by adding 
new features, enhancements or fixing bugs, check out the [contributing 
guidelines](http://apex.incubator.apache.org/contributing.html).

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/README.md
----------------------------------------------------------------------
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..c9fe8da
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,34 @@
+# Apex Documentation
+
+Apex documentation repository for content available on 
http://apex.incubator.apache.org/docs/
+
+Documentation is written in 
[Markdown](https://guides.github.com/features/mastering-markdown/) format and 
statically generated into HTML using [MkDocs](http://www.mkdocs.org/).  All 
documentation is located in the [docs](docs) directory, and 
[mkdocs.yml](mkdocs.yml) file describes the navigation structure of the 
published documentation.
+
+## Authoring
+
+New pages can be added under [docs](docs) or related sub-category, and a 
reference to the new page must be added to the [mkdocs.yml](mkdocs.yml) file to 
make it availabe in the navigation.  Embedded images are typically added to 
images folder at the same level as the new page.
+
+When creating or editing pages, it can be useful to see the live results, and 
how the documents will appear when published.  Live preview feature is 
available by running the following command at the root of the repository:
+
+```bash
+mkdocs serve
+```
+
+For additional details see [writing your 
docs](http://www.mkdocs.org/user-guide/writing-your-docs/) guide.
+
+## Site Configuration
+
+Guides on applying site-wide 
[configuration](http://www.mkdocs.org/user-guide/configuration/) and 
[themeing](http://www.mkdocs.org/user-guide/styling-your-docs/) are available 
on the MkDocs site.
+
+## Deployment
+
+**Under Review**: Current deployment process is under review, and may change 
from the one outlined below.
+
+
+Deployment is done from master branch of the repository by executing the 
following command:
+
+```bash
+mkdocs gh-deploy --clean
+```
+
+This results in all the documentation under [docs](docs) being statically 
generatd into HTML files and deployed as top level in 
[gh-pages](https://github.com/apache/incubating-apex-core/tree/gh-pages) 
branch.  For more details on how this is done see [MkDocs - Deploying Github 
Pages](http://www.mkdocs.org/user-guide/deploying-your-docs/#github-pages).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/apex.md
----------------------------------------------------------------------
diff --git a/docs/apex.md b/docs/apex.md
deleted file mode 100644
index 215a957..0000000
--- a/docs/apex.md
+++ /dev/null
@@ -1,14 +0,0 @@
-Apache Apex
-================================================================================
-
-Apache Apex (incubating) is the industryâs only open source, 
enterprise-grade unified stream and batch processing engine.  Apache Apex 
includes key features requested by open source developer community that are not 
available in current open source technologies.
-
-* Event processing guarantees
-* In-memory performance & scalability
-* Fault tolerance and state management
-* Native rolling and tumbling window support
-* Hadoop-native YARN & HDFS implementation
-
-For additional information visit [Apache 
Apex](http://apex.incubator.apache.org/).
-
-[![](images/apex_logo.png)](http://apex.incubator.apache.org/)

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/apex_development_setup.md
----------------------------------------------------------------------
diff --git a/docs/apex_development_setup.md b/docs/apex_development_setup.md
index 777f2f9..0bbabc5 100644
--- a/docs/apex_development_setup.md
+++ b/docs/apex_development_setup.md
@@ -1,36 +1,29 @@
 Apache Apex Development Environment Setup
 =========================================
 
-This document discusses the steps needed for setting up a development 
environment for creating applications that run on the Apache Apex or the 
DataTorrent RTS streaming platform.
+This document discusses the steps needed for setting up a development 
environment for creating applications that run on the Apache Apex platform.
 
 
-Microsoft Windows
-------------------------------
+Development Tools
+-------------------------------------------------------------------------------
 
-There are a few tools that will be helpful when developing Apache Apex 
applications, some required and some optional:
+There are a few tools that will be helpful when developing Apache Apex 
applications, including:
 
-1.  *git* -- A revision control system (version 1.7.1 or later). There are 
multiple git clients available for Windows (<http://git-scm.com/download/win> 
for example), so download and install a client of your choice.
+1.  **git** - A revision control system (version 1.7.1 or later). There are 
multiple git clients available for Windows (<http://git-scm.com/download/win> 
for example), so download and install a client of your choice.
 
-2.  *java JDK* (not JRE). Includes the Java Runtime Environment as well as the 
Java compiler and a variety of tools (version 1.7.0\_79 or later). Can be 
downloaded from the Oracle website.
+2.  **java JDK** (not JRE) - Includes the Java Runtime Environment as well as 
the Java compiler and a variety of tools (version 1.7.0\_79 or later). Can be 
downloaded from the Oracle website.
 
-3.  *maven* -- Apache Maven is a build system for Java projects (version 3.0.5 
or later). It can be downloaded from <https://maven.apache.org/download.cgi>.
+3.  **maven** - Apache Maven is a build system for Java projects (version 
3.0.5 or later). It can be downloaded from 
<https://maven.apache.org/download.cgi>.
 
-4.  *VirtualBox* -- Oracle VirtualBox is a virtual machine manager (version 
4.3 or later) and can be downloaded from 
<https://www.virtualbox.org/wiki/Downloads>. It is needed to run the Data 
Torrent Sandbox.
+4.  **IDE** (Optional) - If you prefer to use an IDE (Integrated Development 
Environment) such as *NetBeans*, *Eclipse* or *IntelliJ*, install that as well.
 
-5.  *DataTorrent Sandbox* -- The sandbox can be downloaded from 
<https://www.datatorrent.com/download>. It is useful for testing simple 
applications since it contains Apache Hadoop and Data Torrent RTS 3.1.1 
pre-installed with a time-limited Enterprise License. If you already installed 
the RTS Enterprise Edition (evaluation or production license) on a cluster, you 
can use that setup for deployment and testing instead of the sandbox.
+After installing these tools, make sure that the directories containing the 
executable files are in your PATH environment variable.
 
-6.  (Optional) If you prefer to use an IDE (Integrated Development 
Environment) such as *NetBeans*, *Eclipse* or *IntelliJ*, install that as well.
+* **Windows** - Open a console window and enter the command `echo %PATH%` to 
see the value of the `PATH` variable and verify that the above directories for 
Java, git, and maven executables are present.  JDK executables like _java_ and 
_javac_, the directory might be something like `C:\\Program 
Files\\Java\\jdk1.7.0\_80\\bin`; for _git_ it might be `C:\\Program 
Files\\Git\\bin`; and for maven it might be 
`C:\\Users\\user\\Software\\apache-maven-3.3.3\\bin`.  If not, you can change 
its value clicking on the button at _Control Panel_ &#x21e8; _Advanced System 
Settings_ &#x21e8; _Advanced tab_ &#x21e8; _Environment Variables_.
+* **Linux and Mac** - Open a console/terminal window and enter the command 
`echo $PATH` to see the value of the `PATH` variable and verify that the above 
directories for Java, git, and maven executables are present.  If not, make 
sure software is downloaded and installed, and optionally PATH reference is 
added and exported  in a `~/.profile` or `~/.bash_profile`.  For example to add 
maven located in `/sfw/maven/apache-maven-3.3.3` to PATH add the line: `export 
PATH=$PATH:/sfw/maven/apache-maven-3.3.3/bin`
 
 
-After installing these tools, make sure that the directories containing the 
executable files are in your PATH environment; for example, for the JDK 
executables like _java_ and _javac_, the directory might be something like 
`C:\\Program Files\\Java\\jdk1.7.0\_80\\bin`; for _git_ it might be 
`C:\\Program Files\\Git\\bin`; and for maven it might be 
`C:\\Users\\user\\Software\\apache-maven-3.3.3\\bin`. Open a console window and 
enter the command:
-
-    echo %PATH%
-
-to see the value of the `PATH` variable and verify that the above directories 
are present. If not, you can change its value clicking on the button at 
_Control Panel_ &#x21e8; _Advanced System Settings_ &#x21e8; _Advanced tab_ 
&#x21e8; _Environment Variables_.
-
-
-Now run the following commands and ensure that the output is something similar 
to that shown in the table below:
-
+Confirm by running the following commands and comparing with output that show 
in the table below:
 
 <table>
 <colgroup>
@@ -59,65 +52,52 @@ Now run the following commands and ensure that the output 
is something similar t
 <tr class="odd">
 <td align="left"><p><tt>mvn --version</tt></p></td>
 <td align="left"><p>Apache Maven 3.3.3 
(7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T06:57:37-05:00)</p>
-<p>Maven home: C:\Users\ram\Software\apache-maven-3.3.3\bin\..</p>
-<p>Java version: 1.7.0_80, vendor: Oracle Corporation</p>
-<p>Java home: C:\Program Files\Java\jdk1.7.0_80\jre</p>
-<p>Default locale: en_US, platform encoding: Cp1252</p>
-<p>OS name: &quot;windows 8&quot;, version: &quot;6.2&quot;, arch: 
&quot;amd64&quot;, family: &quot;windows&quot;</p></td>
+<p>...</p>
+</td>
 </tr>
 </tbody>
 </table>
 
 
-To install the sandbox, first download it from 
<https://www.datatorrent.com/download> and import the downloaded file into 
VirtualBox. Once the import completes, you can select it and click the  Start 
button to start the sandbox.
-
-
-The sandbox is configured with 6GB RAM; if your development machine has 16GB 
or more, you can increase the sandbox RAM to 8GB or more using the VirtualBox 
console. This will yield better performance and support larger applications. 
Additionally, you can change the network adapter from **NAT** to **Bridged 
Adapter**; this will allow you to login to the sandbox from your host machine 
using an _ssh_ tool like **PuTTY** and also to transfer files to and from the 
host using `pscp` on Windows. Of course all such configuration must be done 
when when the sandbox is not running.
-
-
-You can choose to develop either directly on the sandbox or on your 
development machine. The advantage of the former is that most of the tools 
(e.g. _jdk_, _git_, _maven_) are pre-installed and also the package files 
created by your project are directly available to the Data Torrent tools such 
as  **dtManage** and **dtcli**. The disadvantage is that the sandbox is a 
memory-limited environment so running a memory-hungry tool like a Java IDE on 
it may starve other applications of memory.
+Creating New Apex Project
+-------------------------------------------------------------------------------
 
+After development tools are configured, you can now use the maven archetype to 
create a basic Apache Apex project.  **Note:** When executing the commands 
below, replace `3.3.0-incubating` by [latest available 
version](http://apex.apache.org/downloads.html) of Apache Apex.
 
-You can now use the maven archetype to create a basic Apache Apex project as 
follows: Put these lines in a Windows command file called, for example, 
`newapp.cmd` and run it:
 
-    @echo off
-    @rem Script for creating a new application
-    setlocal
-    mvn archetype:generate ^
-    
-DarchetypeRepository=https://www.datatorrent.com/maven/content/repositories/releases
 ^
-      -DarchetypeGroupId=com.datatorrent ^
-      -DarchetypeArtifactId=apex-app-archetype ^
-      -DarchetypeVersion=3.1.1 ^
-      -DgroupId=com.example ^
-      -Dpackage=com.example.myapexapp ^
-      -DartifactId=myapexapp ^
-      -Dversion=1.0-SNAPSHOT
-    endlocal
+* **Windows** - Create a new Windows command file called `newapp.cmd` by 
copying the lines below, and execute it.  When you run this file, the 
properties will be displayed and you will be prompted with `` Y: :``; just 
press **Enter** to complete the project generation.  The caret (^) at the end 
of some lines indicates that a continuation line follows. 
 
+        @echo off
+        @rem Script for creating a new application
+        setlocal
+        mvn archetype:generate ^
+         -DarchetypeGroupId=org.apache.apex ^
+         -DarchetypeArtifactId=apex-app-archetype 
-DarchetypeVersion=3.3.0-incubating ^
+         -DgroupId=com.example -Dpackage=com.example.myapexapp 
-DartifactId=myapexapp ^
+         -Dversion=1.0-SNAPSHOT
+        endlocal
 
 
-The caret (^) at the end of some lines indicates that a continuation line 
follows. When you run this file, the properties will be displayed and you will 
be prompted with `` Y: :``; just press **Enter** to complete the project 
generation.
+* **Linux** - Execute the lines below in a terminal window.  New project will 
be created in the curent working directory.  The backslash (\\) at the end of 
the lines indicates continuation.
 
+        mvn archetype:generate \
+         -DarchetypeGroupId=org.apache.apex \
+         -DarchetypeArtifactId=apex-app-archetype 
-DarchetypeVersion=3.2.0-incubating \
+         -DgroupId=com.example -Dpackage=com.example.myapexapp 
-DartifactId=myapexapp \
+         -Dversion=1.0-SNAPSHOT
 
-This command file also exists in the Data Torrent _examples_ repository which 
you can check out with:
 
-    git clone https://github.com/DataTorrent/examples
-
-You will find the script under 
`examples\tutorials\topnwords\scripts\newapp.cmd`.
-
-You can also, if you prefer, use an IDE to generate the project as described 
in Section 3 of [Application Packages](application_packages.md) but use the 
archetype version 3.1.1 instead of 3.0.0.
-
-
-When the run completes successfully, you should see a new directory named 
`myapexapp` containing a maven project for building a basic Apache Apex 
application. It includes 3 source files:**Application.java**,  
**RandomNumberGenerator.java** and **ApplicationTest.java**. You can now build 
the application by stepping into the new directory and running the appropriate 
maven command:
+When the run completes successfully, you should see a new directory named 
`myapexapp` containing a maven project for building a basic Apache Apex 
application. It includes 3 source files:**Application.java**,  
**RandomNumberGenerator.java** and **ApplicationTest.java**. You can now build 
the application by stepping into the new directory and running the maven 
package command:
 
     cd myapexapp
     mvn clean package -DskipTests
 
-The build should create the application package file 
`myapexapp\target\myapexapp-1.0-SNAPSHOT.apa`. This file can then be uploaded 
to the Data Torrent GUI tool on the sandbox (called **dtManage**) and launched  
from there. It generates a stream of random numbers and prints them out, each 
prefixed by the string  `hello world: `.  If you built this package on the 
host, you can transfer it to the sandbox using the `pscp` tool bundled with 
**PuTTY** mentioned earlier.
+The build should create the application package file 
`myapexapp/target/myapexapp-1.0-SNAPSHOT.apa`. This application package can 
then be used to launch example application via **dtCli**, or other visual 
management tools.  When running, this application will generate a stream of 
random numbers and print them out, each prefixed by the string `hello world:`.
 
+Building Apex Demos
+-------------------------------------------------------------------------------
 
-If you want to checkout the Apache Apex source repositories and build them, 
you can do so by running the script `build-apex.cmd` located in the same place 
in the examples repository described above. The source repositories contain 
more substantial demo applications and the associated source code. 
Alternatively, if you do not want to use the script, you can follow these 
simple manual steps:
-
+If you want to see more substantial Apex demo applications and the associated 
source code, you can follow these simple steps to check out and build them.
 
 1.  Check out the source code repositories:
 
@@ -126,26 +106,23 @@ If you want to checkout the Apache Apex source 
repositories and build them, you
 
 2.  Switch to the appropriate release branch and build each repository:
 
-        pushd incubator-apex-core
-        git checkout release-3.1
+        cd incubator-apex-core
         mvn clean install -DskipTests
-        popd
-        pushd incubator-apex-malhar
-        git checkout release-3.1
+
+        cd incubator-apex-malhar
         mvn clean install -DskipTests
-        popd
+
 
 The `install` argument to the `mvn` command installs resources from each 
project to your local maven repository (typically `.m2/repository` under your 
home directory), and **not** to the system directories, so Administrator 
privileges are not required. The  `-DskipTests` argument skips running unit 
tests since they take a long time. If this is a first-time installation, it 
might take several minutes to complete because maven will download a number of 
associated plugins.
 
-After the build completes, you should see the demo application package files 
in the target directory under each demo subdirectory in 
`incubator-apex-malhar\demos\`.
+After the build completes, you should see the demo application package files 
in the target directory under each demo subdirectory in 
`incubator-apex-malhar/demos`.
+
 
-Linux
-------------------
 
-Most of the instructions for Linux (and other Unix-like systems) are similar 
to those for Windows described above, so we will just note the differences.
+Sandbox
+-------------------------------------------------------------------------------
 
+To jump start development with an Apache Hadoop single node cluster, 
[DataTorrent Sandbox](https://www.datatorrent.com/download) powered by 
VirtualBox is available on Windows, Linux, or Mac platforms.  The sandbox is 
configured by default to run with 6GB RAM; if your development machine has 16GB 
or more, you can increase the sandbox RAM to 8GB or more using the VirtualBox 
console.  This will yield better performance and support larger applications.  
The advantage of developing in the sandbox is that most of the tools (e.g. 
_jdk_, _git_, _maven_), Hadoop YARN and HDFS, and a distribution of Apache Apex 
and DataTorrent RTS are pre-installed.  The disadvantage is that the sandbox is 
a memory-limited environment, and requires settings changes and restarts to 
adjust memory available for development and testing.
 
-The pre-requisites (such as _git_, _maven_, etc.) are the same as for Windows 
described above; please run the commands in the table and ensure that 
appropriate versions are present in your PATH environment variable (the command 
to display that variable is: `echo $PATH`).
 
 
-The maven archetype command is the same except that continuation lines use a 
backslash (``\``) instead of caret (``^``); the script for it is available in 
the same location and is named `newapp` (without the `.cmd` extension). The 
script to checkout and build the Apache Apex repositories is named `build-apex`.

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/apex_malhar.md
----------------------------------------------------------------------
diff --git a/docs/apex_malhar.md b/docs/apex_malhar.md
index ef2e371..45dee76 100644
--- a/docs/apex_malhar.md
+++ b/docs/apex_malhar.md
@@ -1,19 +1,19 @@
 Apache Apex Malhar
 
================================================================================
 
-Apache Apex Malhar is an open source operator and codec library that can be 
used with the Apache Apex platform to build real-time streaming applications.  
As part of enabling enterprises extract value quickly, Malhar operators help 
get data in, analyze it in real-time and get data out of Hadoop in real-time 
with no paradigm limitations.  In addition to the operators, the library 
contains a number of demos applications, demonstrating operator features and 
capabilities.
+Apache Apex Malhar is an open source operator and codec library that can be 
used with the [Apache Apex](http://apex.apache.org/) platform to build 
real-time streaming applications.  Enabling users to extract value quickly, 
Malhar operators help get data in, analyze it in real-time, and get data out of 
Hadoop.  In addition to the operators, the library contains a number of demos 
applications, demonstrating operator features and capabilities.  To see the 
full list of available operators and related documentation, visit [Apex Malhar 
on Github](https://github.com/apache/incubator-apex-malhar)
 
-![MalharDiagram](images/MalharOperatorOverview.png)
+![MalharDiagram](images/malhar-operators.png)
 
 # Capabilities common across Malhar operators
 
-For most streaming platforms, connectors are afterthoughts and often end up 
being simple âbolt-onsâ to the platform. As a result they often cause 
performance issues or data loss when put through failure scenarios and 
scalability requirements. Malhar operators do not face these issues as they 
were designed to be integral parts of apex*.md RTS. Hence, they have following 
core streaming runtime capabilities
+For most streaming platforms, connectors are afterthoughts and often end up 
being simple âbolt-onsâ to the platform. As a result they often cause 
performance issues or data loss when put through failure scenarios and 
scalability requirements. Malhar operators do not face these issues as they 
were designed to be integral parts of Apex. Hence, they have following core 
streaming runtime capabilities
 
-1.  **Fault tolerance** â Apache Apex Malhar operators where applicable have 
fault tolerance built in. They use the checkpoint capability provided by the 
framework to ensure that there is no data loss under ANY failure scenario.
-2.  **Processing guarantees** â Malhar operators where applicable provide 
out of the box support for ALL three processing guarantees â exactly once, 
at-least once & at-most once WITHOUT requiring the user to write any additional 
code.  Some operators like MQTT operator deal with source systems that cant 
track processed data and hence need the operators to keep track of the data. 
Malhar has support for a generic operator that uses alternate storage like HDFS 
to facilitate this. Finally for databases that support transactions or support 
any sort of atomic batch operations Malhar operators can do exactly once down 
to the tuple level.
+1.  **Fault tolerance** â Malhar operators where applicable have fault 
tolerance built in. They use the checkpoint capability provided by the 
framework to ensure that there is no data loss under ANY failure scenario.
+2.  **Processing guarantees** â Malhar operators where applicable provide 
out of the box support for ALL three processing guarantees â exactly once, 
at-least once, and at-most once WITHOUT requiring the user to write any 
additional code.  Some operators, like MQTT operator, deal with source systems 
that can not track processed data and hence need the operators to keep track of 
the data.  Malhar has support for a generic operator that uses alternate 
storage like HDFS to facilitate this.  Finally for databases that support 
transactions or support any sort of atomic batch operations Malhar operators 
can do exactly once down to the tuple level.
 3.  **Dynamic updates** â Based on changing business conditions you often 
have to tweak several parameters used by the operators in your streaming 
application without incurring any application downtime. You can also change 
properties of a Malhar operator at runtime without having to bring down the 
application.
 4.  **Ease of extensibility** â Malhar operators are based on templates that 
are easy to extend.
-5.  **Partitioning support** â In streaming applications the input data 
stream often needs to be partitioned based on the contents of the stream. Also 
for operators that ingest data from external systems partitioning needs to be 
done based on the capabilities of the external system. E.g. With the Kafka or 
Flume operator, the operator can automatically scale up or down based on the 
changes in the number of Kafka partitions or Flume channels
+5.  **Partitioning support** â In streaming applications the input data 
stream often needs to be partitioned based on the contents of the stream. Also 
for operators that ingest data from external systems partitioning needs to be 
done based on the capabilities of the external system.  For example with Kafka, 
the operator can automatically scale up or down based on the changes in the 
number of Kafka partitions.
 
 # Operator Library Overview
 
@@ -21,45 +21,39 @@ For most streaming platforms, connectors are afterthoughts 
and often end up bein
 
 Below is a summary of the various sub categories of input and output 
operators. Input operators also have a corresponding output operator
 
-*   **File Systems** â Most streaming analytics use cases we have seen 
require the data to be stored in HDFS or perhaps S3 if the application is 
running in AWS. Also, customers often need to re-run their streaming analytical 
applications against historical data or consume data from upstream processes 
that are perhaps writing to some NFS share. Hence, itâs not just enough to be 
able to save data to various file systems. You also have to be able to read 
data from them. RTS supports input & output operators for HDFS, S3, NFS & Local 
Files
-*   **Flume** â NOTE: Flume operator is not yet part of Malhar
+*   **File Systems** â Most streaming analytics use cases require the data 
to be stored in HDFS or perhaps S3 if the application is running in AWS.  Users 
often need to re-run their streaming analytical applications against historical 
data or consume data from upstream processes that are perhaps writing to some 
NFS share.  Apex supports input & output operators for HDFS, S3, NFS & Local 
Files.  There are also File Splitter and Block Reader operators, which can 
accelecate processing of large files by splitting and paralellizing the work 
across non-overlapping sets of file blocks.
+*   **Relational Databases** â Most stream processing use cases require some 
reference data lookups to enrich, tag or filter streaming data. There is also a 
need to save results of the streaming analytical computation to a database so 
an operational dashboard can see them. Apex supports a JDBC operator so you can 
read/write data from any JDBC compliant RDBMS like Oracle, MySQL, Sqlite, etc.
+*   **NoSQL Databases** â NoSQL key-value pair databases like Cassandra & 
HBase are a common part of streaming analytics application architectures to 
lookup reference data or store results.  Malhar has operators for HBase, 
Cassandra, Accumulo, Aerospike, MongoDB, and CouchDB.
+*   **Messaging Systems** â Kafka, JMS, and similar systems are the 
workhorses of messaging infrastructure in most enterprises.  Malhar has a 
robust, industry-tested set of operators to read and write Kafka, JMS, ZeroMQ, 
and RabbitMQ messages.
+*   **Notification Systems** â Malhar includes an operator for sending 
notifications via SMTP.
+*   **In-memory Databases & Caching platforms** - Some streaming use cases 
need instantaneous access to shared state across the application. Caching 
platforms and in-memory databases serve this purpose really well. To support 
these use cases, Malhar has operators for memcached and Redis.
+*   **Social Media** - Malhar includes an operator to connect to the popular 
Twitter stream fire hose.
+*   **Protocols** - Malhar provides connectors that can communicate in HTTP, 
RSS, Socket, WebSocket, FTP, and MQTT.
 
-Many customers have existing Flume deployments that are being used to 
aggregate log data from variety of sources. However Flume does not allow 
analytics on the log data on the fly. The Flume input/output operator enables 
RTS to consume data from flume and analyze it in real-time before being 
persisted.
+## Parsers
 
-*   **Relational databases** â Most stream processing use cases require some 
reference data lookups to enrich, tag or filter streaming data. There is also a 
need to save results of the streaming analytical computation to a database so 
an operational dashboard can see them. RTS supports a JDBC operator so you can 
read/write data from any JDBC compliant RDBMS like Oracle, MySQL etc.
-*   **NoSQL databases** âNoSQL key-value pair databases like Cassandra & 
HBase are becoming a common part of streaming analytics application 
architectures to lookup reference data or store results. Malhar has operators 
for HBase, Cassandra, Accumulo (common with govt. & healthcare companies) 
MongoDB & CouchDB.
-*   **Messaging systems** â JMS brokers have been the workhorses of 
messaging infrastructure in most enterprises. Also Kafka is fast coming up in 
almost every customer we talk to. Malhar has operators to read/write to Kafka, 
any JMS implementation, ZeroMQ & RabbitMQ.
-*   **Notification systems** â Almost every streaming analytics application 
has some notification requirements that are tied to a business condition being 
triggered. Malhar supports sending notifications via SMTP & SNMP. It also has 
an alert escalation mechanism built in so users donât get spammed by 
notifications (a common drawback in most streaming platforms)
-*   **In-memory Databases & Caching platforms** - Some streaming use cases 
need instantaneous access to shared state across the application. Caching 
platforms and in-memory databases serve this purpose really well. To support 
these use cases, Malhar has operators for memcached & Redis
-*   **Protocols** - Streaming use cases driven by machine-to-machine 
communication have one thing in common â there is no standard dominant 
protocol being used for communication. Malhar currently has support for MQTT. 
It is one of the more commonly, adopted protocols we see in the IoT space. 
Malhar also provides connectors that can directly talk to HTTP, RSS, Socket, 
WebSocket & FTP sources
+There are many industry vertical specific data formats that a streaming 
application developer might need to parse. Often there are existing parsers 
available for these that can be directly plugged into an Apache Apex 
application. For example in the Telco space, a Java based CDR parser can be 
directly plugged into Apache Apex operator. To further simplify development 
experience, Malhar also provides some operators for parsing common formats like 
XML (DOM & SAX), JSON (flat map converter), Apache log files, syslog, etc.
 
+## Stream manipulation
 
+Streaming data inevitably needs processing to clean, filter, tag, summarize, 
etc. The goal of Malhar is to enable the application developer to focus on WHAT 
needs to be done to the stream to get it in the right format and not worry 
about the HOW.  Malhar has several operators to perform the common stream 
manipulation actions like â GroupBy, Join, Distinct/Unique, Limit, OrderBy, 
Split, Sample, Inner join, Outer join, Select, Update etc.
 
 ## Compute
 
-One of the most important promises of a streaming analytics platform like 
Apache Apex is the ability to do analytics in real-time. However delivering on 
the promise becomes really difficult when the platform does not provide out of 
the box operators to support variety of common compute functions as the user 
then has to worry about making these scalable, fault tolerant etc. Malhar takes 
this responsibility away from the application developer by providing a huge 
variety of out of the box computational operators. The application developer 
can thus focus on the analysis.
+One of the most important promises of a streaming analytics platform like 
Apache Apex is the ability to do analytics in real-time. However delivering on 
the promise becomes really difficult when the platform does not provide out of 
the box operators to support variety of common compute functions as the user 
then has to worry about making these scalable, fault tolerant, stateful, etc.  
Malhar takes this responsibility away from the application developer by 
providing a variety of out of the box computational operators.
 
 Below is just a snapshot of the compute operators available in Malhar
 
-*   Statistics & Math - Provide various mathematical and statistical 
computations over application defined time windows.
-*   Filtering & pattern matching
-*   Machine learning & Algorithms
-*   Real-time model scoring is a very common use case for stream processing 
platforms. &nbsp;Malhar allows users to invoke their R models from streaming 
applications
-*   Sorting, Maps, Frequency, TopN, BottomN, Random Generator etc.
-
-
-## Query & Script invocation
-
-Many streaming use cases are legacy implementations that need to be ported 
over. This often requires re-use some of the existing investments and code that 
perhaps would be really hard to re-write. With this in mind, Malhar supports 
invoking external scripts and queries as part of the streaming application 
using operators for invoking SQL query, Shell script, Ruby, Jython, and 
JavaScript etc.
-
-## Parsers
-
-There are many industry vertical specific data formats that a streaming 
application developer might need to parse. Often there are existing parsers 
available for these that can be directly plugged into an Apache Apex 
application. For example in the Telco space, a Java based CDR parser can be 
directly plugged into Apache Apex operator. To further simplify development 
experience, Malhar also provides some operators for parsing common formats like 
XML (DOM & SAX), JSON (flat map converter), Apache log files, syslog, etc.
-
-## Stream manipulation
+*   Statistics and math - Various mathematical and statistical computations 
over application defined time windows.
+*   Filtering and pattern matching
+*   Sorting, maps, frequency, TopN, BottomN
+*   Random data generators
 
-Streaming data aka âstreamâ is raw data that inevitably needs processing 
to clean, filter, tag, summarize etc. The goal of Malhar is to enable the 
application developer to focus on âWHATâ needs to be done to the stream to 
get it in the right format and not worry about the âHOWâ. Hence, Malhar has 
several operators to perform the common stream manipulation actions like â 
DeDupe, GroupBy, Join, Distinct/Unique, Limit, OrderBy, Split, Sample, Inner 
join, Outer join, Select, Update etc.
+## Languages Support
 
-## Social Media
+Migrating to a new platform often requires re-use of the existing code that 
would be difficult or time-consuming to re-write.  With this in mind, Malhar 
supports invocation of code written in other languages by wrapping them in one 
of the library operators, and allows execution of software written in:
 
-Malhar includes an operator to connect to the popular Twitter stream fire hose.
+* JavaScript
+* Python
+* R
+* Ruby
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/application_development.md
----------------------------------------------------------------------
diff --git a/docs/application_development.md b/docs/application_development.md
index a76ad8a..c5c9075 100644
--- a/docs/application_development.md
+++ b/docs/application_development.md
@@ -1,50 +1,16 @@
 Application Developer Guide
 ===========================
 
-Real-time big data processing is not only important but has become
-critical for businesses which depend on accurate and timely analysis of
-their business data. A few businesses have yielded to very expensive
-solutions like building an in-house, real-time analytics infrastructure
-supported by an internal development team, or buying expensive
-proprietary software. A large number of businesses are dealing with the
-requirement just by trying to make Hadoop do their batch jobs in smaller
-iterations. Over the last few years, Hadoop has become ubiquitous in the
-big data processing space, replacing expensive proprietary hardware and
-software solutions for massive data processing with very cost-effective,
-fault-tolerant, open-sourced, and commodity-hardware-based solutions.
-While Hadoop has been a game changer for companies, it is primarily a
-batch-oriented system, and does not yet have a viable option for
-real-time data processing. Â Most companies with real-time data
-processing end up having to build customized solutions in addition to
-their Hadoop infrastructure.
-
-Â 
-
-The DataTorrent platform is designed to process massive amounts of
-real-time events natively in Hadoop. This can be event ingestion,
-processing, and aggregation for real-time data analytics, or can be
-real-time business logic decisioning such as cell tower load balancing,
-real-time ads bidding, or fraud detection. Â The platform has the ability
-to repair itself in real-time (without data loss) if hardware fails, and
-adapt to changes in load by adding and removing computing resources
-automatically.
-
-
-
-DataTorrent is a native Hadoop application. It runs as a YARN
-(Hadoop 2.x) application and leverages Hadoop as a distributed operating
-system. All the basic distributed operating system capabilities of
-Hadoop like resource allocation (Resource Manager, distributed file system 
(HDFS),
-multi-tenancy, security, fault-tolerance, scalability,Â etc.
-are supported natively in all streaming applications. Â Just as Hadoop
-for map-reduce handles all the details of the application allowing you
-to only focus on writing the application (the mapper and reducer
-functions), the platform handles all the details of streaming execution,
-allowing you to only focus on your business logic. Using the platform
-removes the need to maintain separate clusters for real-time
-applications.
-
-
+The Apex platform is designed to process massive amounts of
+real-time events natively in Hadoop.  It runs as a YARN (Hadoop 2.x) 
+application and leverages Hadoop as a distributed operating
+system.  All the basic distributed operating system capabilities of
+Hadoop like resource management (YARN), distributed file system (HDFS),
+multi-tenancy, security, fault-tolerance, and scalability are supported 
natively 
+in all the Apex applications. Â The platform handles all the details of the 
application 
+execution, including dynamic scaling, state checkpointing and recovery, event 
+processing guarantees, etc. allowing you to focus on writing your application 
logic without
+mixing operational and functional concerns.
 
 In the platform, building a streaming application can be extremely
 easy and intuitive. Â The application is represented as a Directed
@@ -56,25 +22,24 @@ processing is not available in the Operator Library, one 
can easily
 write a custom operator. We refer those interested in creating their own
 operators to the [Operator Development Guide](operator_development.md).
 
+
 Running A Test Application
 =======================================
 
-This chapter will help you with a quick start on running an
-application. If you are starting with the platform for the first time,
-it would be informative to open an existing application and see it run.
-Do the following steps to run the PI demo, which computes the value of
-PI Â in a simple
-manner:
+If you are starting with the Apex platform for the first time,
+it can be informative to launch an existing application and see it run.
+One of the simplest examples provided in [Apex Malhar](apex_malhar.md) is a Pi 
demo application,
+which computes the value of PI using random numbers.  After [setting up 
development environment](apex_development_setup.md)
+Pi demo can be launched as follows:
 
-1.  Open up platform files in your IDE (for example NetBeans, or Eclipse)
-2.  Open Demos project
-3.  Open Test Packages and run ApplicationTest.javaÂ in piÂ package
-4.  See the results in your system console
+1.  Open up Apex Malhar files in your IDE (for example Eclipse, IntelliJ, 
NetBeans, etc)
+2.  Navigate to 
`demos/pi/src/test/java/com/datatorrent/demos/ApplicationTest.java`
+3.  Run the test for ApplicationTest.java
+4.  View the output in system console
 
 
-
-Congratulations, you just ran your first real-time streaming demo
-:) This demo is very simple and has four operators. The first operator
+Congratulations, you just ran your first real-time streaming demo :) 
+This demo is very simple and has four operators. The first operator
 emits random integers between 0 to 30, 000. The second operator receives
 these coefficients and emits a hashmap with x and y values each time it
 receives two values. The third operator takes these values and computes
@@ -119,6 +84,7 @@ platform. In the remaining part of this document we will go 
through
 details needed for you to develop and run streaming applications in
 Malhar.
 
+
 Test Application: Yahoo! Finance Quotes
 ----------------------------------------------------
 
@@ -622,7 +588,7 @@ attribute APPLICATION_WINDOW_COUNT.
 In the rest of this chapter we will run through the process of
 running this application. We assume that Â you are familiar with details
 of your Hadoop infrastructure. For installation
-details please refer to the [Installation Guide](installation.md).
+details please refer to the [Installation 
Guide](http://docs.datatorrent.com/installation/).
 
 
 Running a Test Application
@@ -1449,7 +1415,7 @@ not impact functionality of the operator. Users can 
change certain
 attributes in runtime. Users cannot add attributes to operators; they
 are pre-defined by the platform. They are interpreted by the platform
 and thus cannot be defined in user created code (like properties).
-Details of attributes are covered in  [Configuration Guide](configuration.md).
+Details of attributes are covered in  [Configuration 
Guide](http://docs.datatorrent.com/configuration/).
 
 ### Operator State
 
@@ -1857,10 +1823,7 @@ Hadoop is a multi-tenant distributed operating system. 
Security is
 an intrinsic element of multi-tenancy as without it a cluster cannot be
 reasonably be shared among enterprise applications. Streaming
 applications follow all multi-tenancy security models used in Hadoop as
-they are native Hadoop applications. For details refer to the
-[Operation and Installation
-Guide](https://www.datatorrent.com/docs/guides/OperationandInstallationGuide.html)
-.
+they are native Hadoop applications.
 
 Security
 ---------------------
@@ -2824,7 +2787,7 @@ is not yet available.
 
 
 
-9: Dynamic Application Modifications
+Dynamic Application Modifications
 =================================================
 
 Dynamic application modifications are being worked on and most of
@@ -2872,7 +2835,7 @@ Dynamic modifications to applications are foundational 
part of the
 platform. They enable users to build layers over the applications. Users
 can also save all the changes done since the application launch, and
 therefore predictably get the application to its current state. For
-details refer to  [Configuration Guide](configuration.md)
+details refer to  [Configuration 
Guide](http://docs.datatorrent.com/configuration/)
 .
 
 
@@ -2881,54 +2844,11 @@ details refer to  [Configuration 
Guide](configuration.md)
 
 
 
-
-User Interface
-===========================
-
-The platform provides a rich user interface. This includes tools
-to monitor the application system metrics (throughput, latency, resource
-utilization, etc.); dashboards for application data, replay, errors; and
-a Developer studio for application creation, launch etc. For details
-refer to  [UI Console Guide](dtmanage.md).
-
-
-
 Demos
 ==================
 
-In this section we list some of the demos that come packaged with
-installer. The source code for the demos is available in the open-source
+The source code for the demos is available in the open-source
 [Apache Apex-Malhar 
repository](https://github.com/apache/incubator-apex-malhar).
 All of these do computations in real-time. Developers are encouraged to
 review them as they use various features of the platform and provide an
-opportunity for quick learning.
-
-1.  Computation of PI:
-    Computes PI by generating a random location on X-Y plane and
-    measuring how often it lies within the unit circle centered
-    at (0,0).
-2.  Yahoo! Finance quoteÂ computation:
-    Computes ticker quote, 1-day chart (per min), and simple moving
-    averages (per 5 min).
-3.  Echoserver Reads messages from a
-    network connection and echoes them back out.
-4.  Twitter top N tweeted urls: Computes
-    top N tweeted urls over last 5 minutes
-5.  Twitter trending hashtags: Computes
-    the top Twitter Hashtags over the last 5 minutes
-6.  Twitter top N frequent words:
-    Computes top N frequent words in a sliding window
-7.  Word count: Computes word count for
-    all words within a large file
-8.  Mobile location tracker: Tracks
-    100,000 cell phones within an area code moving at car speed (jumping
-    cell phone towers every 1-5 seconds).
-9.  Frauddetect: Analyzes a stream of
-    credit card merchant transactions.
-10. Mroperator:Contains several
-    map-reduce applications.
-11. R: Analyzes a synthetic stream of
-    eruption event data for the Old Faithful
-    geyser (https://en.wikipedia.org/wiki/Old_Faithful).
-12. Machinedata: Analyzes a synthetic
-    stream of events to determine health of a machine.Â Â 
+opportunity for quick learning.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/application_packages.md
----------------------------------------------------------------------
diff --git a/docs/application_packages.md b/docs/application_packages.md
index 521779a..06c2980 100644
--- a/docs/application_packages.md
+++ b/docs/application_packages.md
@@ -107,7 +107,7 @@ other IDEs, like Eclipse or IntelliJ, is similar.
 # Writing Your Own App Package
 
 
-Please refer to the [Creating Apps](create.md) on the basics on how to write 
an Apache Apex application.  In your AppPackage project, you can add custom 
operators (refer to [Operator Development 
Guide](https://www.datatorrent.com/docs/guides/OperatorDeveloperGuide.html)), 
project dependencies, default and required configuration properties, pre-set 
configurations and other metadata.
+Please refer to the [Creating Apps](http://docs.datatorrent.com/create/) on 
the basics on how to write an Apache Apex application.  In your AppPackage 
project, you can add custom operators (refer to [Operator Development 
Guide](operator_development.md), project dependencies, default and required 
configuration properties, pre-set configurations and other metadata.
 
 ## Adding (and removing) project dependencies
 
@@ -398,8 +398,6 @@ property:
 
         dt.attr.APPLICATION_NAME
 
-There are also other properties that can be set.  For details on
-properties, refer to the [Operation and Installation 
Guide](https://www.datatorrent.com/docs/guides/OperationandInstallationGuide.html).
 
 In this example, property some_name_1 is a required property which
 must be set at launch time, or it must be set by a pre-set configuration
@@ -623,12 +621,12 @@ Here is an example of launching an application through 
curl:
  lications/MyFirstApplication/launch
 ```
 
-Please refer to the [Gateway API 
reference](https://www.google.com/url?q=https://www.datatorrent.com/docs/guides/DTGatewayAPISpecification.html&sa=D&usg=AFQjCNEWfN7-e7fd6MoWZjmJUE3GW7UwdQ)
 for the complete specification of the REST API.
+Please refer to the [Gateway API](http://docs.datatorrent.com/dtgateway_api/) 
for the complete specification of the REST API.
 
 # Examining and Launching Application Packages Through Apex CLI
 
 If you are working with Application Packages in the local filesystem and
-do not want to deal with dtGateway, you can use the Apex Command Line 
Interface (dtcli).  Please refer to the [Gateway API](dtgateway_api.md)
+do not want to deal with dtGateway, you can use the Apex Command Line 
Interface (dtcli).  Please refer to the [Gateway 
API](http://docs.datatorrent.com/dtgateway_api/)
 to see samples for these commands.
 
 ## Getting Application Package Meta Information

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/autometrics.md
----------------------------------------------------------------------
diff --git a/docs/autometrics.md b/docs/autometrics.md
index f6000e8..c534fb2 100644
--- a/docs/autometrics.md
+++ b/docs/autometrics.md
@@ -123,6 +123,13 @@ An instance of above aggregator can be specified as the 
`METRIC_AGGREGATOR` for
 ```
 
 # Retrieving AutoMetrics
+
+There are two options for retrieving the AutoMetrics:
+
+* Throught DataTorrent Gateway REST API
+* Through REST service on the port of the running STRAM
+
+
 The Gateway REST API provides a way to retrieve the latest AutoMetrics for 
each logical operator.  For example:
 
 ```
@@ -167,145 +174,4 @@ GET 
/ws/v2/applications/{appid}/logicalPlan/operators/{opName}
 }
 ```
 
-However, just like AutoMetrics, the Gateway only provides the latest metrics.  
For historical metrics, we will need the help of App Data Tracker.
-
-# App Data Tracker
-As discussed above, STRAM aggregates the AutoMetrics from physical operators 
(partitions) to something that makes sense in one logical operator.  It pushes 
the aggregated AutoMetrics values using Websocket to the Gateway at every 
second along with system metrics for each operator.  Gateway relays the 
information to an application called App Data Tracker.  It is another Apex 
application that runs in the background and further aggregates the incoming 
values by time bucket and stores the values in HDHT.  It also allows the 
outside to retrieve the aggregated AutoMetrics and system metrics through 
websocket interface.
-
-![AppDataTracker](images/autometrics/adt.png)
-
-App Data Tracker is enabled by having these properties in dt-site.xml:
-
-```xml
-<property>
-  <name>dt.appDataTracker.enable</name>
-  <value>true</value>
-</property>
-<property>
-  <name>dt.appDataTracker.transport</name>
-  <value>builtin:AppDataTrackerFeed</value>
-</property>
-<property>
-  <name>dt.attr.METRICS_TRANSPORT</name>
-  <value>builtin:AppDataTrackerFeed</value>
-</property>
-```
-
-All the applications launched after the App Data Tracker is enabled will have 
metrics sent to it.
-
-**Note**: The App Data Tracker will be shown running in dtManage as a 
âsystem appâ.  It will show up if the âshow system appsâ button is 
pressed.
-
-By default, the time buckets App Data Tracker aggregates upon are one minute, 
one hour and one day.  It can be overridden by changing the operator attribute 
`METRICS_DIMENSIONS_SCHEME`.
-
-Also by default, the app data tracker performs all these aggregations: SUM, 
MIN, MAX, AVG, COUNT, FIRST, LAST on all number metrics.  You can also override 
by changing the same operator attribute `METRICS_DIMENSIONS_SCHEME`, provided 
the custom aggregator is known to the App Data Tracker.  (See next section)
-
-# Custom Aggregator in App Data Tracker
-Custom aggregators allow you to do your own custom computation on statistics 
generated by any of your applications. In order to implement a Custom 
aggregator you have to do two things:
-
-1. Combining new inputs with the current aggregation
-2. Combining two aggregations together into one aggregation
-
-Letâs consider the case where we want to perform the following rolling 
average:
-
-Y_n = Â½ * X_n + Â½ * X_n-1 + Â¼ * X_n-2 + â * X_n-3 +...
-
-This aggregation could be performed by the following Custom Aggregator:
-
-```java
-@Name("IIRAVG")
-public class AggregatorIIRAVG extends AbstractIncrementalAggregator
-{
-  ...
-
-  private void aggregateHelper(DimensionsEvent dest, DimensionsEvent src)
-  {
-    double[] destVals = dest.getAggregates().getFieldsDouble();
-    double[] srcVals = src.getAggregates().getFieldsDouble();
-
-    for (int index = 0; index < destLongs.length; index++) {
-      destVals[index] = .5 * destVals[index] + .5 * srcVals[index];
-    }
-  }
-
-  @Override
-  public void aggregate(Aggregate dest, InputEvent src)
-  {
-    //Aggregate a current aggregation with a new input
-    aggregateHelper(dest, src);
-  }
-
-  @Override
-  public void aggregate(Aggregate destAgg, Aggregate srcAgg)
-  {
-    //Combine two existing aggregations together
-    aggregateHelper(destAgg, srcAgg);
-  }
-}
-```
-
-## Discovery of Custom Aggregators
-AppDataTracker searches for custom aggregator jars under the following 
directories statically before launching:
-
-1. {dt\_installation\_dir}/plugin/aggregators
-2. {user\_home\_dir}/.dt/plugin/aggregators
-
-It uses reflection to find all the classes that extend from 
`IncrementalAggregator` and `OTFAggregator` in these jars and registers them 
with the name provided by `@Name` annotation (or class name when `@Name` is 
absent).
-
-# Using `METRICS_DIMENSIONS_SCHEME`
-
-Here is a sample code snippet on how you can make use of 
`METRICS_DIMENSIONS_SCHEME` to set your own time buckets and your own set of 
aggregators for certain `AutoMetric`s performed by the App Data Tracker in your 
application.
-
-```java
-  @Override
-  public void populateDAG(DAG dag, Configuration configuration)
-  {
-    ...
-    LineReceiver lineReceiver = dag.addOperator("LineReceiver", new 
LineReceiver());
-    ...
-    AutoMetric.DimensionsScheme dimensionsScheme = new 
AutoMetric.DimensionsScheme()
-    {
-      String[] timeBuckets = new String[] { "1s", "1m", "1h" };
-      String[] lengthAggregators = new String[] { "IIRAVG", "SUM" };
-      String[] countAggregators = new String[] { "SUM" };
-
-      /* Setting the aggregation time bucket to be one second, one minute and 
one hour */
-      @Override
-      public String[] getTimeBuckets()
-      {
-        return timeBuckets;
-      }
-
-      @Override
-      public String[] getDimensionAggregationsFor(String logicalMetricName)
-      {
-        if ("length".equals(logicalMetricName)) {
-          return lengthAggregators;
-        } else if ("count".equals(logicalMetricName)) {
-          return countAggregators;
-        } else {
-          return null; // use default
-        }
-      }
-    };
-
-    dag.setAttribute(lineReceiver, OperatorContext.METRICS_DIMENSIONS_SCHEME, 
dimensionsScheme);
-    ...
-  }
-```
-
-
-# Dashboards
-With App Data Tracker enabled, you can visualize the AutoMetrics and system 
metrics in the Dashboards within dtManage.   Refer back to the diagram in the 
App Data Tracker section, dtGateway relays queries and query results to and 
from the App Data Tracker.  In this way, dtManage sends queries and receives 
results from the App Data Tracker via dtGateway and uses the results to let the 
user visualize the data.
-
-Click on the visualize button in dtManage's application page.
-
-![AppDataTracker](images/autometrics/visualize.png)
-
-You will see the dashboard for the AutoMetrics and the system metrics.
-
-![AppDataTracker](images/autometrics/dashboard.png)
-
-The left widget shows the AutoMetrics of `line` and `count` for the 
LineReceiver operator.  The right widget shows the system metrics.
-
-The Dashboards have some simple builtin widgets to visualize the data.  Line 
charts and bar charts are some examples.
-Users will be able to implement their own widgets to visualize their data.
+However, just like AutoMetrics, the Gateway only provides the latest metrics.  
For historical metrics, we will need the help of [App Data 
Tracker](http://docs.datatorrent.com/autometrics/#app-data-tracker).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/dtcli.md
----------------------------------------------------------------------
diff --git a/docs/dtcli.md b/docs/dtcli.md
index 813a27f..8cf11e6 100644
--- a/docs/dtcli.md
+++ b/docs/dtcli.md
@@ -1,12 +1,7 @@
 Apache Apex Command Line Interface
 
================================================================================
 
-dtCli, the Apache Apex command line interface, can be used to launch, monitor, 
and manage
-Apache Apex applications.  dtCli is a wrapper around the [REST 
API](dtgateway_api.md) provided by dtGatway, and
-provides a developer friendly way of interacting with Apache Apex platform. 
The CLI enables a much higher level of feature set by
-hiding deep details of REST API.  Another advantage of dtCli is to provide 
scope, by connecting and executing commands in a context
-of specific application.  dtCli enables easy integration with existing 
enterprise toolset for automated application monitoring
-and management.  Currently the following high level tasks are supported.
+dtCli, the Apache Apex command line interface, can be used to launch, monitor, 
and manage Apache Apex applications.  It provides a developer friendly way of 
interacting with Apache Apex platform.  Another advantage of dtCli is to 
provide scope, by connecting and executing commands in a context of specific 
application.  dtCli enables easy integration with existing enterprise toolset 
for automated application monitoring and management.  Currently the following 
high level tasks are supported.
 
 -   Launch or kill applications
 -   View system metrics including load, throughput, latency, etc.
@@ -19,7 +14,7 @@ and management.  Currently the following high level tasks are 
supported.
 
 ## dtcli Commands
 
-dtCli can be launched by running following command on the same machine where 
dtGatway was installed
+dtCli can be launched by running following command
 
     dtcli
 

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/favicon.ico
----------------------------------------------------------------------
diff --git a/docs/favicon.ico b/docs/favicon.ico
new file mode 100644
index 0000000..c0b3dae
Binary files /dev/null and b/docs/favicon.ico differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/images/MalharOperatorOverview.png
----------------------------------------------------------------------
diff --git a/docs/images/MalharOperatorOverview.png 
b/docs/images/MalharOperatorOverview.png
deleted file mode 100644
index 40bee4a..0000000
Binary files a/docs/images/MalharOperatorOverview.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000..6a78abf
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,20 @@
+Apache Apex (Incubating)
+================================================================================
+
+Apex is a Hadoop YARN native big data processing platform, enabling real time 
stream as well as batch processing for your big data.  Apex provides the 
following benefits:
+
+* High scalability and performance
+* Fault tolerance and state management
+* Hadoop-native YARN & HDFS implementation
+* Event processing guarantees
+* Separation of functional and operational concerns
+* Simple API supports generic Java code
+
+Platform has been demonstated to scale linearly across Hadoop clusters under 
extreme loads of billions of events per second.  Hardware and process failures 
are quickly recovered with HDFS-backed checkpointing and automatic operator 
recovery, preserving application state and resuming execution in seconds.  
Functional and operational specifications are separated.  Apex provides a 
simple API, which enables users to write generic, reusable code.  The code is 
dropped in as-is and platform automatically handles the various operational 
concerns, such as state management, fault tolerance, scalability, security, 
metrics, etc.  This frees users to focus on functional development, and lets 
platform provide operability support.
+
+The core Apex platform is supplemented by Malhar, a library of connector and 
logic functions, enabling rapid application development.  These operators and 
modules provide access to HDFS, S3, NFS, FTP, and other file systems; Kafka, 
ActiveMQ, RabbitMQ, JMS, and other message systems; MySql, Cassandra, MongoDB, 
Redis, HBase, CouchDB, generic JDBC, and other database connectors. The Malhar 
library also includes a host of other common business logic patterns that help 
users to significantly reduce the time it takes to go into production.  Ease of 
integration with all other big data technologies is one of the primary missions 
of Malhar.
+
+
+For additional information visit [Apache Apex 
(incubating)](http://apex.incubator.apache.org/).
+
+[![](favicon.ico)](http://apex.incubator.apache.org/)

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/docs/operator_development.md
----------------------------------------------------------------------
diff --git a/docs/operator_development.md b/docs/operator_development.md
index f502725..85ebab5 100644
--- a/docs/operator_development.md
+++ b/docs/operator_development.md
@@ -287,7 +287,7 @@ Code
 
 The source code for the tutorial can be found here:
 
-[https://github.com/DataTorrent/examples/tree/master/tutorials/operatorTutorial](https://www.google.com/url?q=https://github.com/DataTorrent/examples/tree/master/tutorials/operatorTutorial&sa=D&usg=AFQjCNHAAgSpNprHJVvy9GSjdlD1uwU7jw)
+[https://github.com/DataTorrent/examples/tree/master/tutorials/operatorTutorial](https://github.com/DataTorrent/examples/tree/master/tutorials/operatorTutorial)
 
 
 Operator Reference <a name="operator_reference"></a>
@@ -447,3 +447,9 @@ ports.
 1. Invoke constructor; non-transients initialized.
 2. Copy state from checkpoint -- initialized values from step 1 are
 replaced.
+
+
+Malhar Operator Library
+==========================
+
+To see the full list of Apex Malhar operators along with related 
documentation, visit [Apex Malhar on 
Github](https://github.com/apache/incubator-apex-malhar)

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/cfdacdf2/mkdocs.yml
----------------------------------------------------------------------
diff --git a/mkdocs.yml b/mkdocs.yml
new file mode 100644
index 0000000..c6a26d7
--- /dev/null
+++ b/mkdocs.yml
@@ -0,0 +1,15 @@
+site_name: Apache Apex Documentation
+site_favicon: favicon.ico
+theme: readthedocs
+pages:
+- Apache Apex: index.md
+- Apache Apex-Malhar: apex_malhar.md
+- Development:
+    - Development Setup: apex_development_setup.md
+    - Applications: application_development.md
+    - Application Packages: application_packages.md
+    - Configuration Packages: configuration_packages.md
+    - Operators: operator_development.md
+    - AutoMetric API: autometrics.md
+- Operations:
+    - dtCli: dtcli.md

[43/50] [abbrv] incubator-apex-core git commit: APEXCORE-293 Adding Apex Core documentation

Reply via email to