Migrating docs

Project: http://git-wip-us.apache.org/repos/asf/incubator-apex-core/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-apex-core/commit/44f220fd
Tree: http://git-wip-us.apache.org/repos/asf/incubator-apex-core/tree/44f220fd
Diff: http://git-wip-us.apache.org/repos/asf/incubator-apex-core/diff/44f220fd

Branch: refs/heads/APEXCORE-293
Commit: 44f220fd221bbd4ace04943e1d2e9bc275391923
Parents: c07663b
Author: sashadt <[email protected]>
Authored: Fri Jan 29 18:39:20 2016 -0800
Committer: Thomas Weise <[email protected]>
Committed: Sun Feb 28 22:46:41 2016 -0800

----------------------------------------------------------------------
 apex.md                                         |   14 -
 apex_development_setup.md                       |  151 -
 apex_malhar.md                                  |   65 -
 application_development.md                      | 2934 ------------------
 application_packages.md                         |  669 ----
 autometrics.md                                  |  311 --
 configuration_packages.md                       |  242 --
 docs/apex.md                                    |   14 +
 docs/apex_development_setup.md                  |  151 +
 docs/apex_malhar.md                             |   65 +
 docs/application_development.md                 | 2934 ++++++++++++++++++
 docs/application_packages.md                    |  669 ++++
 docs/autometrics.md                             |  311 ++
 docs/configuration_packages.md                  |  242 ++
 docs/dtcli.md                                   |  273 ++
 ...cationConfigurationPackages.html-image00.png |  Bin 0 -> 50038 bytes
 ...cationConfigurationPackages.html-image01.png |  Bin 0 -> 43756 bytes
 ...cationConfigurationPackages.html-image02.png |  Bin 0 -> 49752 bytes
 docs/images/MalharOperatorOverview.png          |  Bin 0 -> 297948 bytes
 docs/images/apex_logo.png                       |  Bin 0 -> 35621 bytes
 .../ApplicationDeveloperGuide.html-image00.png  |  Bin 0 -> 30204 bytes
 .../ApplicationDeveloperGuide.html-image01.png  |  Bin 0 -> 44041 bytes
 .../ApplicationDeveloperGuide.html-image02.png  |  Bin 0 -> 21927 bytes
 .../ApplicationDeveloperGuide.html-image03.png  |  Bin 0 -> 66578 bytes
 .../ApplicationDeveloperGuide.html-image04.png  |  Bin 0 -> 47909 bytes
 .../ApplicationDeveloperGuide.html-image05.png  |  Bin 0 -> 40228 bytes
 .../ApplicationDeveloperGuide.html-image06.png  |  Bin 0 -> 37807 bytes
 .../ApplicationDeveloperGuide.html-image07.png  |  Bin 0 -> 38504 bytes
 .../ApplicationDeveloperGuide.html-image08.png  |  Bin 0 -> 29070 bytes
 .../ApplicationDeveloperGuide.html-image09.png  |  Bin 0 -> 47030 bytes
 docs/images/autometrics/adt.png                 |  Bin 0 -> 25372 bytes
 docs/images/autometrics/dashboard.png           |  Bin 0 -> 79952 bytes
 docs/images/autometrics/visualize.png           |  Bin 0 -> 35073 bytes
 docs/images/operator/image00.png                |  Bin 0 -> 19541 bytes
 docs/images/operator/image01.png                |  Bin 0 -> 25962 bytes
 docs/images/operator/image02.png                |  Bin 0 -> 26407 bytes
 docs/images/operator/image03.png                |  Bin 0 -> 9465 bytes
 docs/images/operator/image04.png                |  Bin 0 -> 14620 bytes
 docs/images/operator/image05.png                |  Bin 0 -> 6227 bytes
 docs/operator_development.md                    |  449 +++
 dtcli.md                                        |  273 --
 ...cationConfigurationPackages.html-image00.png |  Bin 50038 -> 0 bytes
 ...cationConfigurationPackages.html-image01.png |  Bin 43756 -> 0 bytes
 ...cationConfigurationPackages.html-image02.png |  Bin 49752 -> 0 bytes
 .../ApplicationPackages.html-image00.png        |  Bin 43756 -> 0 bytes
 .../ApplicationPackages.html-image01.png        |  Bin 29535 -> 0 bytes
 .../ApplicationPackages.html-image02.png        |  Bin 49468 -> 0 bytes
 images/MalharOperatorOverview.png               |  Bin 297948 -> 0 bytes
 images/apex_logo.png                            |  Bin 35621 -> 0 bytes
 .../ApplicationDeveloperGuide.html-image00.png  |  Bin 30204 -> 0 bytes
 .../ApplicationDeveloperGuide.html-image01.png  |  Bin 44041 -> 0 bytes
 .../ApplicationDeveloperGuide.html-image02.png  |  Bin 21927 -> 0 bytes
 .../ApplicationDeveloperGuide.html-image03.png  |  Bin 66578 -> 0 bytes
 .../ApplicationDeveloperGuide.html-image04.png  |  Bin 47909 -> 0 bytes
 .../ApplicationDeveloperGuide.html-image05.png  |  Bin 40228 -> 0 bytes
 .../ApplicationDeveloperGuide.html-image06.png  |  Bin 37807 -> 0 bytes
 .../ApplicationDeveloperGuide.html-image07.png  |  Bin 38504 -> 0 bytes
 .../ApplicationDeveloperGuide.html-image08.png  |  Bin 29070 -> 0 bytes
 .../ApplicationDeveloperGuide.html-image09.png  |  Bin 47030 -> 0 bytes
 images/autometrics/adt.png                      |  Bin 25372 -> 0 bytes
 images/autometrics/dashboard.png                |  Bin 79952 -> 0 bytes
 images/autometrics/visualize.png                |  Bin 35073 -> 0 bytes
 images/operator/image00.png                     |  Bin 19541 -> 0 bytes
 images/operator/image01.png                     |  Bin 25962 -> 0 bytes
 images/operator/image02.png                     |  Bin 26407 -> 0 bytes
 images/operator/image03.png                     |  Bin 9465 -> 0 bytes
 images/operator/image04.png                     |  Bin 14620 -> 0 bytes
 images/operator/image05.png                     |  Bin 6227 -> 0 bytes
 operator_development.md                         |  449 ---
 69 files changed, 5108 insertions(+), 5108 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/apex.md
----------------------------------------------------------------------
diff --git a/apex.md b/apex.md
deleted file mode 100644
index 215a957..0000000
--- a/apex.md
+++ /dev/null
@@ -1,14 +0,0 @@
-Apache Apex
-================================================================================
-
-Apache Apex (incubating) is the industry’s only open source, 
enterprise-grade unified stream and batch processing engine.  Apache Apex 
includes key features requested by open source developer community that are not 
available in current open source technologies.
-
-* Event processing guarantees
-* In-memory performance & scalability
-* Fault tolerance and state management
-* Native rolling and tumbling window support
-* Hadoop-native YARN & HDFS implementation
-
-For additional information visit [Apache 
Apex](http://apex.incubator.apache.org/).
-
-[![](images/apex_logo.png)](http://apex.incubator.apache.org/)

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/apex_development_setup.md
----------------------------------------------------------------------
diff --git a/apex_development_setup.md b/apex_development_setup.md
deleted file mode 100644
index 777f2f9..0000000
--- a/apex_development_setup.md
+++ /dev/null
@@ -1,151 +0,0 @@
-Apache Apex Development Environment Setup
-=========================================
-
-This document discusses the steps needed for setting up a development 
environment for creating applications that run on the Apache Apex or the 
DataTorrent RTS streaming platform.
-
-
-Microsoft Windows
-------------------------------
-
-There are a few tools that will be helpful when developing Apache Apex 
applications, some required and some optional:
-
-1.  *git* -- A revision control system (version 1.7.1 or later). There are 
multiple git clients available for Windows (<http://git-scm.com/download/win> 
for example), so download and install a client of your choice.
-
-2.  *java JDK* (not JRE). Includes the Java Runtime Environment as well as the 
Java compiler and a variety of tools (version 1.7.0\_79 or later). Can be 
downloaded from the Oracle website.
-
-3.  *maven* -- Apache Maven is a build system for Java projects (version 3.0.5 
or later). It can be downloaded from <https://maven.apache.org/download.cgi>.
-
-4.  *VirtualBox* -- Oracle VirtualBox is a virtual machine manager (version 
4.3 or later) and can be downloaded from 
<https://www.virtualbox.org/wiki/Downloads>. It is needed to run the Data 
Torrent Sandbox.
-
-5.  *DataTorrent Sandbox* -- The sandbox can be downloaded from 
<https://www.datatorrent.com/download>. It is useful for testing simple 
applications since it contains Apache Hadoop and Data Torrent RTS 3.1.1 
pre-installed with a time-limited Enterprise License. If you already installed 
the RTS Enterprise Edition (evaluation or production license) on a cluster, you 
can use that setup for deployment and testing instead of the sandbox.
-
-6.  (Optional) If you prefer to use an IDE (Integrated Development 
Environment) such as *NetBeans*, *Eclipse* or *IntelliJ*, install that as well.
-
-
-After installing these tools, make sure that the directories containing the 
executable files are in your PATH environment; for example, for the JDK 
executables like _java_ and _javac_, the directory might be something like 
`C:\\Program Files\\Java\\jdk1.7.0\_80\\bin`; for _git_ it might be 
`C:\\Program Files\\Git\\bin`; and for maven it might be 
`C:\\Users\\user\\Software\\apache-maven-3.3.3\\bin`. Open a console window and 
enter the command:
-
-    echo %PATH%
-
-to see the value of the `PATH` variable and verify that the above directories 
are present. If not, you can change its value clicking on the button at 
_Control Panel_ &#x21e8; _Advanced System Settings_ &#x21e8; _Advanced tab_ 
&#x21e8; _Environment Variables_.
-
-
-Now run the following commands and ensure that the output is something similar 
to that shown in the table below:
-
-
-<table>
-<colgroup>
-<col width="30%" />
-<col width="70%" />
-</colgroup>
-<tbody>
-<tr class="odd">
-<td align="left"><p>Command</p></td>
-<td align="left"><p>Output</p></td>
-</tr>
-<tr class="even">
-<td align="left"><p><tt>javac -version</tt></p></td>
-<td align="left"><p>javac 1.7.0_80</p></td>
-</tr>
-<tr class="odd">
-<td align="left"><p><tt>java -version</tt></p></td>
-<td align="left"><p>java version &quot;1.7.0_80&quot;</p>
-<p>Java(TM) SE Runtime Environment (build 1.7.0_80-b15)</p>
-<p>Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)</p></td>
-</tr>
-<tr class="even">
-<td align="left"><p><tt>git --version</tt></p></td>
-<td align="left"><p>git version 2.6.1.windows.1</p></td>
-</tr>
-<tr class="odd">
-<td align="left"><p><tt>mvn --version</tt></p></td>
-<td align="left"><p>Apache Maven 3.3.3 
(7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T06:57:37-05:00)</p>
-<p>Maven home: C:\Users\ram\Software\apache-maven-3.3.3\bin\..</p>
-<p>Java version: 1.7.0_80, vendor: Oracle Corporation</p>
-<p>Java home: C:\Program Files\Java\jdk1.7.0_80\jre</p>
-<p>Default locale: en_US, platform encoding: Cp1252</p>
-<p>OS name: &quot;windows 8&quot;, version: &quot;6.2&quot;, arch: 
&quot;amd64&quot;, family: &quot;windows&quot;</p></td>
-</tr>
-</tbody>
-</table>
-
-
-To install the sandbox, first download it from 
<https://www.datatorrent.com/download> and import the downloaded file into 
VirtualBox. Once the import completes, you can select it and click the  Start 
button to start the sandbox.
-
-
-The sandbox is configured with 6GB RAM; if your development machine has 16GB 
or more, you can increase the sandbox RAM to 8GB or more using the VirtualBox 
console. This will yield better performance and support larger applications. 
Additionally, you can change the network adapter from **NAT** to **Bridged 
Adapter**; this will allow you to login to the sandbox from your host machine 
using an _ssh_ tool like **PuTTY** and also to transfer files to and from the 
host using `pscp` on Windows. Of course all such configuration must be done 
when when the sandbox is not running.
-
-
-You can choose to develop either directly on the sandbox or on your 
development machine. The advantage of the former is that most of the tools 
(e.g. _jdk_, _git_, _maven_) are pre-installed and also the package files 
created by your project are directly available to the Data Torrent tools such 
as  **dtManage** and **dtcli**. The disadvantage is that the sandbox is a 
memory-limited environment so running a memory-hungry tool like a Java IDE on 
it may starve other applications of memory.
-
-
-You can now use the maven archetype to create a basic Apache Apex project as 
follows: Put these lines in a Windows command file called, for example, 
`newapp.cmd` and run it:
-
-    @echo off
-    @rem Script for creating a new application
-    setlocal
-    mvn archetype:generate ^
-    
-DarchetypeRepository=https://www.datatorrent.com/maven/content/repositories/releases
 ^
-      -DarchetypeGroupId=com.datatorrent ^
-      -DarchetypeArtifactId=apex-app-archetype ^
-      -DarchetypeVersion=3.1.1 ^
-      -DgroupId=com.example ^
-      -Dpackage=com.example.myapexapp ^
-      -DartifactId=myapexapp ^
-      -Dversion=1.0-SNAPSHOT
-    endlocal
-
-
-
-The caret (^) at the end of some lines indicates that a continuation line 
follows. When you run this file, the properties will be displayed and you will 
be prompted with `` Y: :``; just press **Enter** to complete the project 
generation.
-
-
-This command file also exists in the Data Torrent _examples_ repository which 
you can check out with:
-
-    git clone https://github.com/DataTorrent/examples
-
-You will find the script under 
`examples\tutorials\topnwords\scripts\newapp.cmd`.
-
-You can also, if you prefer, use an IDE to generate the project as described 
in Section 3 of [Application Packages](application_packages.md) but use the 
archetype version 3.1.1 instead of 3.0.0.
-
-
-When the run completes successfully, you should see a new directory named 
`myapexapp` containing a maven project for building a basic Apache Apex 
application. It includes 3 source files:**Application.java**,  
**RandomNumberGenerator.java** and **ApplicationTest.java**. You can now build 
the application by stepping into the new directory and running the appropriate 
maven command:
-
-    cd myapexapp
-    mvn clean package -DskipTests
-
-The build should create the application package file 
`myapexapp\target\myapexapp-1.0-SNAPSHOT.apa`. This file can then be uploaded 
to the Data Torrent GUI tool on the sandbox (called **dtManage**) and launched  
from there. It generates a stream of random numbers and prints them out, each 
prefixed by the string  `hello world: `.  If you built this package on the 
host, you can transfer it to the sandbox using the `pscp` tool bundled with 
**PuTTY** mentioned earlier.
-
-
-If you want to checkout the Apache Apex source repositories and build them, 
you can do so by running the script `build-apex.cmd` located in the same place 
in the examples repository described above. The source repositories contain 
more substantial demo applications and the associated source code. 
Alternatively, if you do not want to use the script, you can follow these 
simple manual steps:
-
-
-1.  Check out the source code repositories:
-
-        git clone https://github.com/apache/incubator-apex-core
-        git clone https://github.com/apache/incubator-apex-malhar
-
-2.  Switch to the appropriate release branch and build each repository:
-
-        pushd incubator-apex-core
-        git checkout release-3.1
-        mvn clean install -DskipTests
-        popd
-        pushd incubator-apex-malhar
-        git checkout release-3.1
-        mvn clean install -DskipTests
-        popd
-
-The `install` argument to the `mvn` command installs resources from each 
project to your local maven repository (typically `.m2/repository` under your 
home directory), and **not** to the system directories, so Administrator 
privileges are not required. The  `-DskipTests` argument skips running unit 
tests since they take a long time. If this is a first-time installation, it 
might take several minutes to complete because maven will download a number of 
associated plugins.
-
-After the build completes, you should see the demo application package files 
in the target directory under each demo subdirectory in 
`incubator-apex-malhar\demos\`.
-
-Linux
-------------------
-
-Most of the instructions for Linux (and other Unix-like systems) are similar 
to those for Windows described above, so we will just note the differences.
-
-
-The pre-requisites (such as _git_, _maven_, etc.) are the same as for Windows 
described above; please run the commands in the table and ensure that 
appropriate versions are present in your PATH environment variable (the command 
to display that variable is: `echo $PATH`).
-
-
-The maven archetype command is the same except that continuation lines use a 
backslash (``\``) instead of caret (``^``); the script for it is available in 
the same location and is named `newapp` (without the `.cmd` extension). The 
script to checkout and build the Apache Apex repositories is named `build-apex`.

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/apex_malhar.md
----------------------------------------------------------------------
diff --git a/apex_malhar.md b/apex_malhar.md
deleted file mode 100644
index ef2e371..0000000
--- a/apex_malhar.md
+++ /dev/null
@@ -1,65 +0,0 @@
-Apache Apex Malhar
-================================================================================
-
-Apache Apex Malhar is an open source operator and codec library that can be 
used with the Apache Apex platform to build real-time streaming applications.  
As part of enabling enterprises extract value quickly, Malhar operators help 
get data in, analyze it in real-time and get data out of Hadoop in real-time 
with no paradigm limitations.  In addition to the operators, the library 
contains a number of demos applications, demonstrating operator features and 
capabilities.
-
-![MalharDiagram](images/MalharOperatorOverview.png)
-
-# Capabilities common across Malhar operators
-
-For most streaming platforms, connectors are afterthoughts and often end up 
being simple ‘bolt-ons’ to the platform. As a result they often cause 
performance issues or data loss when put through failure scenarios and 
scalability requirements. Malhar operators do not face these issues as they 
were designed to be integral parts of apex*.md RTS. Hence, they have following 
core streaming runtime capabilities
-
-1.  **Fault tolerance** – Apache Apex Malhar operators where applicable have 
fault tolerance built in. They use the checkpoint capability provided by the 
framework to ensure that there is no data loss under ANY failure scenario.
-2.  **Processing guarantees** – Malhar operators where applicable provide 
out of the box support for ALL three processing guarantees – exactly once, 
at-least once & at-most once WITHOUT requiring the user to write any additional 
code.  Some operators like MQTT operator deal with source systems that cant 
track processed data and hence need the operators to keep track of the data. 
Malhar has support for a generic operator that uses alternate storage like HDFS 
to facilitate this. Finally for databases that support transactions or support 
any sort of atomic batch operations Malhar operators can do exactly once down 
to the tuple level.
-3.  **Dynamic updates** – Based on changing business conditions you often 
have to tweak several parameters used by the operators in your streaming 
application without incurring any application downtime. You can also change 
properties of a Malhar operator at runtime without having to bring down the 
application.
-4.  **Ease of extensibility** – Malhar operators are based on templates that 
are easy to extend.
-5.  **Partitioning support** – In streaming applications the input data 
stream often needs to be partitioned based on the contents of the stream. Also 
for operators that ingest data from external systems partitioning needs to be 
done based on the capabilities of the external system. E.g. With the Kafka or 
Flume operator, the operator can automatically scale up or down based on the 
changes in the number of Kafka partitions or Flume channels
-
-# Operator Library Overview
-
-## Input/output connectors
-
-Below is a summary of the various sub categories of input and output 
operators. Input operators also have a corresponding output operator
-
-*   **File Systems** – Most streaming analytics use cases we have seen 
require the data to be stored in HDFS or perhaps S3 if the application is 
running in AWS. Also, customers often need to re-run their streaming analytical 
applications against historical data or consume data from upstream processes 
that are perhaps writing to some NFS share. Hence, it’s not just enough to be 
able to save data to various file systems. You also have to be able to read 
data from them. RTS supports input & output operators for HDFS, S3, NFS & Local 
Files
-*   **Flume** – NOTE: Flume operator is not yet part of Malhar
-
-Many customers have existing Flume deployments that are being used to 
aggregate log data from variety of sources. However Flume does not allow 
analytics on the log data on the fly. The Flume input/output operator enables 
RTS to consume data from flume and analyze it in real-time before being 
persisted.
-
-*   **Relational databases** – Most stream processing use cases require some 
reference data lookups to enrich, tag or filter streaming data. There is also a 
need to save results of the streaming analytical computation to a database so 
an operational dashboard can see them. RTS supports a JDBC operator so you can 
read/write data from any JDBC compliant RDBMS like Oracle, MySQL etc.
-*   **NoSQL databases** –NoSQL key-value pair databases like Cassandra & 
HBase are becoming a common part of streaming analytics application 
architectures to lookup reference data or store results. Malhar has operators 
for HBase, Cassandra, Accumulo (common with govt. & healthcare companies) 
MongoDB & CouchDB.
-*   **Messaging systems** – JMS brokers have been the workhorses of 
messaging infrastructure in most enterprises. Also Kafka is fast coming up in 
almost every customer we talk to. Malhar has operators to read/write to Kafka, 
any JMS implementation, ZeroMQ & RabbitMQ.
-*   **Notification systems** – Almost every streaming analytics application 
has some notification requirements that are tied to a business condition being 
triggered. Malhar supports sending notifications via SMTP & SNMP. It also has 
an alert escalation mechanism built in so users don’t get spammed by 
notifications (a common drawback in most streaming platforms)
-*   **In-memory Databases & Caching platforms** - Some streaming use cases 
need instantaneous access to shared state across the application. Caching 
platforms and in-memory databases serve this purpose really well. To support 
these use cases, Malhar has operators for memcached & Redis
-*   **Protocols** - Streaming use cases driven by machine-to-machine 
communication have one thing in common – there is no standard dominant 
protocol being used for communication. Malhar currently has support for MQTT. 
It is one of the more commonly, adopted protocols we see in the IoT space. 
Malhar also provides connectors that can directly talk to HTTP, RSS, Socket, 
WebSocket & FTP sources
-
-
-
-## Compute
-
-One of the most important promises of a streaming analytics platform like 
Apache Apex is the ability to do analytics in real-time. However delivering on 
the promise becomes really difficult when the platform does not provide out of 
the box operators to support variety of common compute functions as the user 
then has to worry about making these scalable, fault tolerant etc. Malhar takes 
this responsibility away from the application developer by providing a huge 
variety of out of the box computational operators. The application developer 
can thus focus on the analysis.
-
-Below is just a snapshot of the compute operators available in Malhar
-
-*   Statistics & Math - Provide various mathematical and statistical 
computations over application defined time windows.
-*   Filtering & pattern matching
-*   Machine learning & Algorithms
-*   Real-time model scoring is a very common use case for stream processing 
platforms. &nbsp;Malhar allows users to invoke their R models from streaming 
applications
-*   Sorting, Maps, Frequency, TopN, BottomN, Random Generator etc.
-
-
-## Query & Script invocation
-
-Many streaming use cases are legacy implementations that need to be ported 
over. This often requires re-use some of the existing investments and code that 
perhaps would be really hard to re-write. With this in mind, Malhar supports 
invoking external scripts and queries as part of the streaming application 
using operators for invoking SQL query, Shell script, Ruby, Jython, and 
JavaScript etc.
-
-## Parsers
-
-There are many industry vertical specific data formats that a streaming 
application developer might need to parse. Often there are existing parsers 
available for these that can be directly plugged into an Apache Apex 
application. For example in the Telco space, a Java based CDR parser can be 
directly plugged into Apache Apex operator. To further simplify development 
experience, Malhar also provides some operators for parsing common formats like 
XML (DOM & SAX), JSON (flat map converter), Apache log files, syslog, etc.
-
-## Stream manipulation
-
-Streaming data aka ‘stream’ is raw data that inevitably needs processing 
to clean, filter, tag, summarize etc. The goal of Malhar is to enable the 
application developer to focus on ‘WHAT’ needs to be done to the stream to 
get it in the right format and not worry about the ‘HOW’. Hence, Malhar has 
several operators to perform the common stream manipulation actions like – 
DeDupe, GroupBy, Join, Distinct/Unique, Limit, OrderBy, Split, Sample, Inner 
join, Outer join, Select, Update etc.
-
-## Social Media
-
-Malhar includes an operator to connect to the popular Twitter stream fire hose.

Reply via email to