[04/11] incubator-apex-core git commit: Migrating docs

thw Fri, 04 Mar 2016 08:15:35 -0800

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/7af835b0/application_packages.md
----------------------------------------------------------------------
diff --git a/application_packages.md b/application_packages.md
deleted file mode 100644
index 521779a..0000000
--- a/application_packages.md
+++ /dev/null
@@ -1,669 +0,0 @@
-Apache Apex Application Packages
-================================
-
-An Apache Apex Application Package is a zip file that contains all the
-necessary files to launch an application in Apache Apex. It is the
-standard way for assembling and sharing an Apache Apex application.
-
-# Requirements
-
-You will need have the following installed:
-
-1. Apache Maven 3.0 or later (for assembling the App Package)
-2. Apache Apex 3.0.0 or later (for launching the App Package in your cluster)
-
-# Creating Your First Apex App Package
-
-You can create an Apex Application Package using your Linux command
-line, or using your favorite IDE.
-
-## Using Command Line
-
-First, change to the directory where you put your projects, and create
-an Apex application project using Maven by running the following
-command.  Replace "com.example", "mydtapp" and "1.0-SNAPSHOT" with the
-appropriate values (make sure this is all on one line):
-
-    $ mvn archetype:generate \
-     -DarchetypeGroupId=org.apache.apex \
-     -DarchetypeArtifactId=apex-app-archetype 
-DarchetypeVersion=3.2.0-incubating \
-     -DgroupId=com.example -Dpackage=com.example.mydtapp -DartifactId=mydtapp \
-     -Dversion=1.0-SNAPSHOT
-
-This creates a Maven project named "mydtapp". Open it with your favorite
-IDE (e.g. NetBeans, Eclipse, IntelliJ IDEA). In the project, there is a
-sample DAG that generates a number of tuples with a random number and
-prints out "hello world" and the random number in the tuples.  The code
-that builds the DAG is in
-src/main/java/com/example/mydtapp/Application.java, and the code that
-runs the unit test for the DAG is in
-src/test/java/com/example/mydtapp/ApplicationTest.java. Try it out by
-running the following command:
-
-    $cd mydtapp; mvn package
-
-This builds the App Package runs the unit test of the DAG.  You should
-be getting test output similar to this:
-
-```
- -------------------------------------------------------
-  TESTS
- -------------------------------------------------------
-
- Running com.example.mydtapp.ApplicationTest
- hello world: 0.8015370953286478
- hello world: 0.9785359225545481
- hello world: 0.6322611586644047
- hello world: 0.8460953663451775
- hello world: 0.5719372906929072
- hello world: 0.6361174312337172
- hello world: 0.14873007534816318
- hello world: 0.8866986277418261
- hello world: 0.6346526809866057
- hello world: 0.48587295703904465
- hello world: 0.6436832429676687
-
- ...
-
- Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.863
- sec
-
- Results :
-
- Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
-```
-
-The "mvn package" command creates the App Package file in target
-directory as target/mydtapp-1.0-SNAPSHOT.apa. You will be able to use
-that App Package file to launch this sample application in your actual
-Apex installation.
-
-## Using IDE
-
-Alternatively, you can do the above steps all within your IDE.  For
-example, in NetBeans, select File -\> New Project.  Then choose âMavenâ
-and âProject from Archetypeâ in the dialog box, as shown.
-
-![](images/AppPackage/ApplicationPackages.html-image00.png)
-
-Then fill the Group ID, Artifact ID, Version and Repository entries as shown 
below.
-
-![](images/AppPackage/ApplicationPackages.html-image02.png)
-
-Group ID: org.apache.apex
-Artifact ID: apex-app-archetype
-Version: 3.2.0-incubating (or any later version)
-
-Press Next and fill out the rest of the required information. For
-example:
-
-![](images/AppPackage/ApplicationPackages.html-image01.png)
-
-Click Finish, and now you have created your own Apache Apex App Package
-project, with a default unit test.  You can run the unit test, make code
-changes or make dependency changes within your IDE.  The procedure for
-other IDEs, like Eclipse or IntelliJ, is similar.
-
-# Writing Your Own App Package
-
-
-Please refer to the [Creating Apps](create.md) on the basics on how to write 
an Apache Apex application.  In your AppPackage project, you can add custom 
operators (refer to [Operator Development 
Guide](https://www.datatorrent.com/docs/guides/OperatorDeveloperGuide.html)), 
project dependencies, default and required configuration properties, pre-set 
configurations and other metadata.
-
-## Adding (and removing) project dependencies
-
-Under the project, you can add project dependencies in pom.xml, or do it
-through your IDE.  Hereâs the section that describes the dependencies in
-the default pom.xml:
-```
-  <dependencies>
-    <!-- add your dependencies here -->
-    <dependency>
-      <groupId>org.apache.apex</groupId>
-      <artifactId>malhar-library</artifactId>
-      <version>${apex.version}</version>
-      <!--
-           If you know your application do not need the transitive 
dependencies that are pulled in by malhar-library,
-           Uncomment the following to reduce the size of your app package.
-      -->
-      <!--
-      <exclusions>
-        <exclusion>
-          <groupId>*</groupId>
-          <artifactId>*</artifactId>
-        </exclusion>
-      </exclusions>
-      -->
-    </dependency>
-    <dependency>
-      <groupId>org.apache.apex</groupId>
-      <artifactId>apex-engine</artifactId>
-      <version>${apex.version}</version>
-      <scope>provided</scope>
-    </dependency>
-    <dependency>
-      <groupId>junit</groupId>
-      <artifactId>junit</artifactId>
-      <version>4.10</version>
-      <scope>test</scope>
-    </dependency>
-  </dependencies>
-```
-
-By default, as shown above, the default dependencies include
-malhar-library in compile scope, dt-engine in provided scope, and junit
-in test scope.  Do not remove these three dependencies since they are
-necessary for any Apex application.  You can, however, exclude
-transitive dependencies from malhar-library to reduce the size of your
-App Package, provided that none of the operators in malhar-library that
-need the transitive dependencies will be used in your application.
-
-In the sample application, it is safe to remove the transitive
-dependencies from malhar-library, by uncommenting the "exclusions"
-section.  It will reduce the size of the sample App Package from 8MB to
-700KB.
-
-Note that if we exclude \*, in some versions of Maven, you may get
-warnings similar to the following:
-
-```
-
- [WARNING] 'dependencies.dependency.exclusions.exclusion.groupId' for
- org.apache.apex:malhar-library:jar with value '*' does not match a
- valid id pattern.
-
- [WARNING]
- [WARNING] It is highly recommended to fix these problems because they
- threaten the stability of your build.
- [WARNING]
- [WARNING] For this reason, future Maven versions might no longer support
- building such malformed projects.
- [WARNING]
-
-```
-This is a bug in early versions of Maven 3.  The dependency exclusion is
-still valid and it is safe to ignore these warnings.
-
-## Application Configuration
-
-A configuration file can be used to configure an application.  Different
-kinds of configuration parameters can be specified. They are application
-attributes, operator attributes and properties, port attributes, stream
-properties and application specific properties. They are all specified
-as name value pairs, in XML format, like the following.
-
-```
-<?xml version="1.0"?>
-<configuration>
-  <property>
-    <name>some_name_1</name>
-    <value>some_default_value</value>
-  </property>
-  <property>
-    <name>some_name_2</name>
-    <value>some_default_value</value>
-  </property>
-</configuration>
-```
-
-## Application attributes
-
-Application attributes are used to specify the platform behavior for the
-application. They can be specified using the parameter
-```dt.attr.<attribute>```. The prefix âdtâ is a constant, âattrâ is a
-constant denoting an attribute is being specified and ```<attribute>```
-specifies the name of the attribute. Below is an example snippet setting
-the streaming windows size of the application to be 1000 milliseconds.
-
-```
-  <property>
-     <name>dt.attr.STREAMING_WINDOW_SIZE_MILLIS</name>
-     <value>1000</value>
-  </property>
-```
-
-The name tag specifies the attribute and value tag specifies the
-attribute value. The name of the attribute is a JAVA constant name
-identifying the attribute. The constants are defined in
-com.datatorrent.api.Context.DAGContext and the different attributes can
-be specified in the format described above.
-
-## Operator attributes
-
-Operator attributes are used to specify the platform behavior for the
-operator. They can be specified using the parameter
-```dt.operator.<operator-name>.attr.<attribute>```. The prefix âdtâ is a
-constant, âoperatorâ is a constant denoting that an operator is being
-specified, ```<operator-name>``` denotes the name of the operator, âattrâ 
is
-the constant denoting that an attribute is being specified and
-```<attribute>``` is the name of the attribute. The operator name is the
-same name that is specified when the operator is added to the DAG using
-the addOperator method. An example illustrating the specification is
-shown below. It specifies the number of streaming windows for one
-application window of an operator named âinputâ to be 10
-
-```
-<property>
-  <name>dt.operator.input.attr.APPLICATION_WINDOW_COUNT</name>
-  <value>10</value>
-</property>
-```
-
-The name tag specifies the attribute and value tag specifies the
-attribute value. The name of the attribute is a JAVA constant name
-identifying the attribute. The constants are defined in
-com.datatorrent.api.Context.OperatorContext and the different attributes
-can be specified in the format described above.
-
-## Operator properties
-
-Operators can be configured using operator specific properties. The
-properties can be specified using the parameter
-```dt.operator.<operator-name>.prop.<property-name>```. The difference
-between this and the operator attribute specification described above is
-that the keyword âpropâ is used to denote that it is a property and
-```<property-name>``` specifies the property name.  An example illustrating
-this is specified below. It specifies the property âhostnameâ of the
-redis server for a âredisâ output operator.
-
-```
-  <property>
-    <name>dt.operator.redis.prop.host</name>
-    <value>127.0.0.1</value>
-  </property>
-```
-
-The name tag specifies the property and the value specifies the property
-value. The property name is converted to a setter method which is called
-on the actual operator. The method name is composed by appending the
-word âsetâ and the property name with the first character of the name
-capitalized. In the above example the setter method would become
-setHost. The method is called using JAVA reflection and the property
-value is passed as an argument. In the above example the method setHost
-will be called on the âredisâ operator with â127.0.0.1â as the 
argument.
-
-## Port attributes
-Port attributes are used to specify the platform behavior for input and
-output ports. They can be specified using the parameter 
```dt.operator.<operator-name>.inputport.<port-name>.attr.<attribute>```
-for input port and 
```dt.operator.<operator-name>.outputport.<port-name>.attr.<attribute>```
-for output port. The keyword âinputportâ is used to denote an input port
-and âoutputportâ to denote an output port. The rest of the specification
-follows the conventions described in other specifications above. An
-example illustrating this is specified below. It specifies the queue
-capacity for an input port named âinputâ of an operator named ârangeâ 
to
-be 4k.
-
-```
-<property>
-  <name>dt.operator.range.inputport.input.attr.QUEUE_CAPACITY</name>
-  <value>4000</value>
-</property>
-```
-
-The name tag specifies the attribute and value tag specifies the
-attribute value. The name of the attribute is a JAVA constant name
-identifying the attribute. The constants are defined in
-com.datatorrent.api.Context.PortContext and the different attributes can
-be specified in the format described above.
-
-The attributes for an output port can also be specified in a similar way
-as described above with a change that keyword âoutputportâ is used
-instead of âintputportâ. A generic keyword âportâ can be used to 
specify
-either an input or an output port. It is useful in the wildcard
-specification described below.
-
-## Stream properties
-
-Streams can be configured using stream properties. The properties can be
-specified using the parameter
-```dt.stream.<stream-name>.prop.<property-name>```  The constant âstreamâ
-specifies that it is a stream, ```<stream-name>``` specifies the name of the
-stream and ```<property-name>``` the name of the property. The name of the
-stream is the same name that is passed when the stream is added to the
-DAG using the addStream method. An example illustrating the
-specification is shown below. It sets the locality of the stream named
-âstream1â to container local indicating that the operators the stream is
-connecting be run in the same container.
-
-```
-  <property>
-    <name>dt.stream.stream1.prop.locality</name>
-    <value>CONTAINER_LOCAL</value>
-  </property>
-```
-
-The property name is converted into a set method on the stream in the
-same way as described in operator properties section above. In this case
-the method would be setLocality and it will be called in the stream
-âstream1â with the value as the argument.
-
-Along with the above system defined parameters, the applications can
-define their own specific parameters they can be specified in the
-configuration file. The only condition is that the names of these
-parameters donât conflict with the system defined parameters or similar
-application parameters defined by other applications. To this end, it is
-recommended that the application parameters have the format
-```<full-application-class-name>.<param-name>.``` The
-full-application-class-name is the full JAVA class name of the
-application including the package path and param-name is the name of the
-parameter within the application. The application will still have to
-still read the parameter in using the configuration API of the
-configuration object that is passed in populateDAG.
-
-##  Wildcards
-
-Wildcards and regular expressions can be used in place of names to
-specify a group for applications, operators, ports or streams. For
-example, to specify an attribute for all ports of an operator it can be
-done as follows
-```
-<property>
-  <name>dt.operator.range.port.*.attr.QUEUE_CAPACITY</name>
-  <value>4000</value>
-</property>
-```
-
-The wildcard â\*â was used instead of the name of the port. Wildcard can
-also be used for operator name, stream name or application name. Regular
-expressions can also be used for names to specify attributes or
-properties for a specific set.
-
-## Adding configuration properties
-
-It is common for applications to require configuration parameters to
-run.  For example, the address and port of the database, the location of
-a file for ingestion, etc.  You can specify them in
-src/main/resources/META-INF/properties.xml under the App Package
-project. The properties.xml may look like:
-
-```
-<?xml version="1.0"?>
-<configuration>
-  <property>
-    <name>some_name_1</name>
-  </property>
-  <property>
-    <name>some_name_2</name>
-    <value>some_default_value</value>
-  </property>
-</configuration>
-```
-
-The name of an application-specific property takes the form of:
-
-```dt.operator.{opName}.prop.{propName} ```
-
-The first represents the property with name propName of operator opName.
- Or you can set the application name at run time by setting this
-property:
-
-        dt.attr.APPLICATION_NAME
-
-There are also other properties that can be set.  For details on
-properties, refer to the [Operation and Installation 
Guide](https://www.datatorrent.com/docs/guides/OperationandInstallationGuide.html).
-
-In this example, property some_name_1 is a required property which
-must be set at launch time, or it must be set by a pre-set configuration
-(see next section).  Property some\_name\_2 is a property that is
-assigned with value some\_default\_value unless it is overridden at
-launch time.
-
-## Adding pre-set configurations
-
-
-At build time, you can add pre-set configurations to the App Package by
-adding configuration XML files under ```src/site/conf/<conf>.xml```in your
-project.  You can then specify which configuration to use at launch
-time.  The configuration XML is of the same format of the properties.xml
-file.
-
-## Application-specific properties file
-
-You can also specify properties.xml per application in the application
-package.  Just create a file with the name properties-{appName}.xml and
-it will be picked up when you launch the application with the specified
-name within the application package.  In short:
-
-  properties.xml: Properties that are global to the Configuration
-Package
-
-  properties-{appName}.xml: Properties that are specific when launching
-an application with the specified appName.
-
-## Properties source precedence
-
-If properties with the same key appear in multiple sources (e.g. from
-app package default configuration as META-INF/properties.xml, from app
-package configuration in the conf directory, from launch time defines,
-etc), the precedence of sources, from highest to lowest, is as follows:
-
-1. Launch time defines (using -D option in CLI, or the POST payload
-    with the Gateway REST APIâs launch call)
-2. Launch time specified configuration file in file system (using -conf
-    option in CLI)
-3. Launch time specified package configuration (using -apconf option in
-    CLI or the conf={confname} with Gateway REST APIâs launch call)
-4. Configuration from \$HOME/.dt/dt-site.xml
-5. Application defaults within the package as
-    META-INF/properties-{appname}.xml
-6. Package defaults as META-INF/properties.xml
-7. dt-site.xml in local DT installation
-8. dt-site.xml stored in HDFS
-
-## Other meta-data
-
-In a Apex App Package project, the pom.xml file contains a
-section that looks like:
-
-```
-<properties>
-  <apex.version>3.2.0-incubating</apex.version>
-  <apex.apppackage.classpath\>lib*.jar</apex.apppackage.classpath>
-</properties>
-```
-apex.version is the Apache Apex version that are to be used
-with this Application Package.
-
-apex.apppackage.classpath is the classpath that is used when
-launching the application in the Application Package.  The default is
-lib/\*.jar, where lib is where all the dependency jars are kept within
-the Application Package.  One reason to change this field is when your
-Application Package needs the classpath in a specific order.
-
-## Logging configuration
-
-Just like other Java projects, you can change the logging configuration
-by having your log4j.properties under src/main/resources.  For example,
-if you have the following in src/main/resources/log4j.properties:
-```
- log4j.rootLogger=WARN,CONSOLE
- log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
- log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
- log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} [%t] %-5p
- %c{2} %M - %m%n
-```
-
-The root loggerâs level is set to WARN and the output is set to the console 
(stdout).
-
-Note that by default from project created from the maven archetype,
-there is already a log4j.properties file under src/test/resources and
-that file is only used for the unit test.
-
-# Zip Structure of Application Package
-
-
-Apache Apex Application Package files are zip files.  You can examine the 
content of any Application Package by using unzip -t on your Linux command line.
-
-There are four top level directories in an Application Package:
-
-1. "app" contains the jar files of the DAG code and any custom operators.
-2. "lib" contains all dependency jars
-3. "conf" contains all the pre-set configuration XML files.
-4. "META-INF" contains the MANIFEST.MF file and the properties.xml file.
-5. âresourcesâ contains other files that are to be served by the Gateway 
on behalf of the app package.
-
-
-# Managing Application Packages Through DT Gateway
-
-The DT Gateway provides storing and retrieving Application Packages to
-and from your distributed file system, e.g. HDFS.
-
-## Storing an Application Package
-
-You can store your Application Packages through DT Gateway using this
-REST call:
-
-```
- POST /ws/v2/appPackages
-```
-
-The payload is the raw content of your Application Package.  For
-example, you can issue this request using curl on your Linux command
-line like this, assuming your DT Gateway is accepting requests at
-localhost:9090:
-
-```
-$ curl -XPOST -T <app-package-file> http://localhost:9090/ws/v2/appPackages
-```
-
-## Getting Meta Information on Application Packages
-
-
-You can get the meta information on Application Packages stored through
-DT Gateway using this call.  The information includes the logical plan
-of each application within the Application Package.
-
-```
- GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}
-```
-
-## Getting Available Operators In Application Package
-
-You can get the list of available operators in the Application Package
-using this call.
-
-```
-GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/operators?parent={parent}
-```
-
-The parent parameter is optional.  If given, parent should be the fully
-qualified class name.  It will only return operators that derive from
-that class or interface. For example, if parent is
-com.datatorrent.api.InputOperator, this call will only return input
-operators provided by the Application Package.
-
-## Getting Properties of Operators in Application Package
-
-You can get the list of properties of any operator in the Application
-Package using this call.
-
-```
-GET  /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/operators/{className}
-```
-
-## Getting List of Pre-Set Configurations in Application Package
-
-You can get a list of pre-set configurations within the Application
-Package using this call.
-
-```
-GET /ws/v2/appPackages/{owner}/{pkgName}/{packageVersion}/configs
-```
-
-You can also get the content of a specific pre-set configuration within
-the Application Package.
-
-```
- GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/configs/{configName}
-```
-
-## Changing Pre-Set Configurations in Application Package
-
-You can create or replace pre-set configurations within the Application
-Package
-```
- PUT   /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/configs/{configName}
-```
-The payload of this PUT call is the XML file that represents the pre-set 
configuration.  The Content-Type of the payload is "application/xml" and you 
can delete a pre-set configuration within the Application Package.
-```
- DELETE /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/configs/{configName}
-```
-
-## Retrieving an Application Package
-
-You can download the Application Package file.  This Application Package
-is not necessarily the same file as the one that was originally uploaded
-since the pre-set configurations may have been modified.
-
-```
- GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/download
-```
-
-## Launching an Application Package
-
-You can launch an application within an Application Package.
-```
-POST 
/ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/applications/{appName}/launch?config={configName}
-```
-
-The config parameter is optional.  If given, it must be one of the
-pre-set configuration within the given Application Package.  The
-Content-Type of the payload of the POST request is "application/json"
-and should contain the properties to be launched with the application.
- It is of the form:
-
-```
- {"property-name":"property-value", ... }
-```
-
-Here is an example of launching an application through curl:
-
-```
- $ curl -XPOST -d'{"dt.operator.console.prop.stringFormat":"xyz %s"}'
- http://localhost:9090/ws/v2/appPackages/dtadmin/mydtapp/1.0-SNAPSHOT/app
- lications/MyFirstApplication/launch
-```
-
-Please refer to the [Gateway API 
reference](https://www.google.com/url?q=https://www.datatorrent.com/docs/guides/DTGatewayAPISpecification.html&sa=D&usg=AFQjCNEWfN7-e7fd6MoWZjmJUE3GW7UwdQ)
 for the complete specification of the REST API.
-
-# Examining and Launching Application Packages Through Apex CLI
-
-If you are working with Application Packages in the local filesystem and
-do not want to deal with dtGateway, you can use the Apex Command Line 
Interface (dtcli).  Please refer to the [Gateway API](dtgateway_api.md)
-to see samples for these commands.
-
-## Getting Application Package Meta Information
-
-You can get the meta information about the Application Package using
-this Apex CLI command.
-
-```
- dt> get-app-package-info <app-package-file>
-```
-
-## Getting Available Operators In Application Package
-
-You can get the list of available operators in the Application Package
-using this command.
-
-```
- dt> get-app-package-operators <app-package-file> <package-prefix>
- [parent-class]
-```
-
-## Getting Properties of Operators in Application Package
-
-You can get the list of properties of any operator in the Application
-Package using this command.
-
- dt> get-app-package-operator-properties <app-package-file> <operator-class>
-
-
-## Launching an Application Package
-
-You can launch an application within an Application Package.
-```
-dt> launch [-D property-name=property-value, ...] [-conf config-name]
- [-apconf config-file-within-app-package] <app-package-file>
- [matching-app-name]
-```
-Note that -conf expects a configuration file in the file system, while -apconf 
expects a configuration file within the app package.


http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/7af835b0/autometrics.md
----------------------------------------------------------------------
diff --git a/autometrics.md b/autometrics.md
deleted file mode 100644
index f6000e8..0000000
--- a/autometrics.md
+++ /dev/null
@@ -1,311 +0,0 @@
-Apache Apex AutoMetrics
-=======================
-
-# Introduction
-Metrics collect various statistical information about a process which can be 
very useful for diagnosis. Auto Metrics in Apex can help monitor operators in a 
running application.  The goal of *AutoMetric* API is to enable operator 
developer to define relevant metrics for an operator in a simple way which the 
platform collects and reports automatically.
-
-# Specifying AutoMetrics in an Operator
-An *AutoMetric* can be any object. It can be of a primitive type - int, long, 
etc. or a complex one. A field or a `get` method in an operator can be 
annotated with `@AutoMetric` to specify that its value is a metric. After every 
application end window, the platform collects the values of these 
fields/methods in a map and sends it to application master.
-
-```java
-public class LineReceiver extends BaseOperator
-{
- @AutoMetric
- long length;
-
- @AutoMetric
- long count;
-
- public final transient DefaultInputPort<String> input = new 
DefaultInputPort<String>()
- {
-   @Override
-   public void process(String s)
-   {
-     length += s.length();
-     count++;
-   }
- };
-
- @Override
- public void beginWindow(long windowId)
- {
-   length = 0;
-   count = 0;
- }
-}
-```
-
-There are 2 auto-metrics declared in the `LineReceiver`. At the end of each 
application window, the platform will send a map with 2 entries - `[(length, 
100), (count, 10)]` to the application master.
-
-# Aggregating AutoMetrics across Partitions
-When an operator is partitioned, it is useful to aggregate the values of 
auto-metrics across all its partitions every window to get a logical view of 
these metrics. The application master performs these aggregations using metrics 
aggregators.
-
-The AutoMetric API helps to achieve this by providing an interface for writing 
aggregators- `AutoMetric.Aggregator`. Any implementation of 
`AutoMetric.Aggregator` can be set as an operator attribute - 
`METRICS_AGGREGATOR` for a particular operator which in turn is used for 
aggregating physical metrics.
-
-## Default aggregators
-[`MetricsAggregator`](https://github.com/apache/incubator-apex-core/blob/devel-3/common/src/main/java/com/datatorrent/common/metric/MetricsAggregator.java)
 is a simple implementation of `AutoMetric.Aggregator` that platform uses as a 
default for summing up primitive types - int, long, float and double.
-
-`MetricsAggregator` is just a collection of `SingleMetricAggregator`s. There 
are multiple implementations of `SingleMetricAggregator` that perform sum, min, 
max, avg which are present in Apex core and Apex malhar.
-
-For the `LineReceiver` operator, the application developer need not specify 
any aggregator. The platform will automatically inject an instance of 
`MetricsAggregator` that contains two `LongSumAggregator`s - one for `length` 
and one for `count`. This aggregator will report sum of length and sum of count 
across all the partitions of `LineReceiver`.
-
-
-## Building custom aggregators
-Platform cannot perform any meaningful aggregations for non-numeric metrics. 
In such cases, the operator or application developer can write custom 
aggregators. Letâs say, if the `LineReceiver` was modified to have a complex 
metric as shown below.
-
-```java
-public class AnotherLineReceiver extends BaseOperator
-{
-  @AutoMetric
-  final LineMetrics lineMetrics = new LineMetrics();
-
-  public final transient DefaultInputPort<String> input = new 
DefaultInputPort<String>()
-  {
-    @Override
-    public void process(String s)
-    {
-      lineMetrics.length += s.length();
-      lineMetrics.count++;
-    }
-  };
-
-  @Override
-  public void beginWindow(long windowId)
-  {
-    lineMetrics.length = 0;
-    lineMetrics.count = 0;
-  }
-
-  public static class LineMetrics implements Serializable
-  {
-    long length;
-    long count;
-
-    private static final long serialVersionUID = 201511041908L;
-  }
-}
-```
-
-Below is a custom aggregator that can calculate average line length across all 
partitions of `AnotherLineReceiver`.
-
-```java
-public class AvgLineLengthAggregator implements AutoMetric.Aggregator
-{
-
-  Map<String, Object> result = Maps.newHashMap();
-
-  @Override
-  public Map<String, Object> aggregate(long l, 
Collection<AutoMetric.PhysicalMetricsContext> collection)
-  {
-    long totalLength = 0;
-    long totalCount = 0;
-    for (AutoMetric.PhysicalMetricsContext pmc : collection) {
-      AnotherLineReceiver.LineMetrics lm = 
(AnotherLineReceiver.LineMetrics)pmc.getMetrics().get("lineMetrics");
-      totalLength += lm.length;
-      totalCount += lm.count;
-    }
-    result.put("avgLineLength", totalLength/totalCount);
-    return result;
-  }
-}
-```
-An instance of above aggregator can be specified as the `METRIC_AGGREGATOR` 
for `AnotherLineReceiver` while creating the DAG as shown below.
-
-```java
-  @Override
-  public void populateDAG(DAG dag, Configuration configuration)
-  {
-    ...
-    AnotherLineReceiver lineReceiver = dag.addOperator("LineReceiver", new 
AnotherLineReceiver());
-    dag.setAttribute(lineReceiver, Context.OperatorContext.METRICS_AGGREGATOR, 
new AvgLineLengthAggregator());
-    ...
-  }
-```
-
-# Retrieving AutoMetrics
-The Gateway REST API provides a way to retrieve the latest AutoMetrics for 
each logical operator.  For example:
-
-```
-GET /ws/v2/applications/{appid}/logicalPlan/operators/{opName}
-{
-    ...
-    "autoMetrics": {
-       "count": "71314",
-       "length": "27780706"
-    },
-    "className": "com.datatorrent.autometric.LineReceiver",
-    ...
-}
-```
-
-# System Metrics
-System metrics are standard operator metrics provided by the system.  Examples 
include:
-
-- processed tuples per second
-- emitted tuples per second
-- total tuples processed
-- total tuples emitted
-- latency
-- CPU percentage
-- failure count
-- checkpoint elapsed time
-
-The Gateway REST API provides a way to retrieve the latest values for all of 
the above for each of the logical operators in the application.
-
-```
-GET /ws/v2/applications/{appid}/logicalPlan/operators/{opName}
-{
-    ...
-    "cpuPercentageMA": "{cpuPercentageMA}",
-    "failureCount": "{failureCount}",
-    "latencyMA": "{latencyMA}",  
-    "totalTuplesEmitted": "{totalTuplesEmitted}",
-    "totalTuplesProcessed": "{totalTuplesProcessed}",
-    "tuplesEmittedPSMA": "{tuplesEmittedPSMA}",
-    "tuplesProcessedPSMA": "{tuplesProcessedPSMA}",
-    ...
-}
-```
-
-However, just like AutoMetrics, the Gateway only provides the latest metrics.  
For historical metrics, we will need the help of App Data Tracker.
-
-# App Data Tracker
-As discussed above, STRAM aggregates the AutoMetrics from physical operators 
(partitions) to something that makes sense in one logical operator.  It pushes 
the aggregated AutoMetrics values using Websocket to the Gateway at every 
second along with system metrics for each operator.  Gateway relays the 
information to an application called App Data Tracker.  It is another Apex 
application that runs in the background and further aggregates the incoming 
values by time bucket and stores the values in HDHT.  It also allows the 
outside to retrieve the aggregated AutoMetrics and system metrics through 
websocket interface.
-
-![AppDataTracker](images/autometrics/adt.png)
-
-App Data Tracker is enabled by having these properties in dt-site.xml:
-
-```xml
-<property>
-  <name>dt.appDataTracker.enable</name>
-  <value>true</value>
-</property>
-<property>
-  <name>dt.appDataTracker.transport</name>
-  <value>builtin:AppDataTrackerFeed</value>
-</property>
-<property>
-  <name>dt.attr.METRICS_TRANSPORT</name>
-  <value>builtin:AppDataTrackerFeed</value>
-</property>
-```
-
-All the applications launched after the App Data Tracker is enabled will have 
metrics sent to it.
-
-**Note**: The App Data Tracker will be shown running in dtManage as a 
âsystem appâ.  It will show up if the âshow system appsâ button is 
pressed.
-
-By default, the time buckets App Data Tracker aggregates upon are one minute, 
one hour and one day.  It can be overridden by changing the operator attribute 
`METRICS_DIMENSIONS_SCHEME`.
-
-Also by default, the app data tracker performs all these aggregations: SUM, 
MIN, MAX, AVG, COUNT, FIRST, LAST on all number metrics.  You can also override 
by changing the same operator attribute `METRICS_DIMENSIONS_SCHEME`, provided 
the custom aggregator is known to the App Data Tracker.  (See next section)
-
-# Custom Aggregator in App Data Tracker
-Custom aggregators allow you to do your own custom computation on statistics 
generated by any of your applications. In order to implement a Custom 
aggregator you have to do two things:
-
-1. Combining new inputs with the current aggregation
-2. Combining two aggregations together into one aggregation
-
-Letâs consider the case where we want to perform the following rolling 
average:
-
-Y_n = Â½ * X_n + Â½ * X_n-1 + Â¼ * X_n-2 + â * X_n-3 +...
-
-This aggregation could be performed by the following Custom Aggregator:
-
-```java
-@Name("IIRAVG")
-public class AggregatorIIRAVG extends AbstractIncrementalAggregator
-{
-  ...
-
-  private void aggregateHelper(DimensionsEvent dest, DimensionsEvent src)
-  {
-    double[] destVals = dest.getAggregates().getFieldsDouble();
-    double[] srcVals = src.getAggregates().getFieldsDouble();
-
-    for (int index = 0; index < destLongs.length; index++) {
-      destVals[index] = .5 * destVals[index] + .5 * srcVals[index];
-    }
-  }
-
-  @Override
-  public void aggregate(Aggregate dest, InputEvent src)
-  {
-    //Aggregate a current aggregation with a new input
-    aggregateHelper(dest, src);
-  }
-
-  @Override
-  public void aggregate(Aggregate destAgg, Aggregate srcAgg)
-  {
-    //Combine two existing aggregations together
-    aggregateHelper(destAgg, srcAgg);
-  }
-}
-```
-
-## Discovery of Custom Aggregators
-AppDataTracker searches for custom aggregator jars under the following 
directories statically before launching:
-
-1. {dt\_installation\_dir}/plugin/aggregators
-2. {user\_home\_dir}/.dt/plugin/aggregators
-
-It uses reflection to find all the classes that extend from 
`IncrementalAggregator` and `OTFAggregator` in these jars and registers them 
with the name provided by `@Name` annotation (or class name when `@Name` is 
absent).
-
-# Using `METRICS_DIMENSIONS_SCHEME`
-
-Here is a sample code snippet on how you can make use of 
`METRICS_DIMENSIONS_SCHEME` to set your own time buckets and your own set of 
aggregators for certain `AutoMetric`s performed by the App Data Tracker in your 
application.
-
-```java
-  @Override
-  public void populateDAG(DAG dag, Configuration configuration)
-  {
-    ...
-    LineReceiver lineReceiver = dag.addOperator("LineReceiver", new 
LineReceiver());
-    ...
-    AutoMetric.DimensionsScheme dimensionsScheme = new 
AutoMetric.DimensionsScheme()
-    {
-      String[] timeBuckets = new String[] { "1s", "1m", "1h" };
-      String[] lengthAggregators = new String[] { "IIRAVG", "SUM" };
-      String[] countAggregators = new String[] { "SUM" };
-
-      /* Setting the aggregation time bucket to be one second, one minute and 
one hour */
-      @Override
-      public String[] getTimeBuckets()
-      {
-        return timeBuckets;
-      }
-
-      @Override
-      public String[] getDimensionAggregationsFor(String logicalMetricName)
-      {
-        if ("length".equals(logicalMetricName)) {
-          return lengthAggregators;
-        } else if ("count".equals(logicalMetricName)) {
-          return countAggregators;
-        } else {
-          return null; // use default
-        }
-      }
-    };
-
-    dag.setAttribute(lineReceiver, OperatorContext.METRICS_DIMENSIONS_SCHEME, 
dimensionsScheme);
-    ...
-  }
-```
-
-
-# Dashboards
-With App Data Tracker enabled, you can visualize the AutoMetrics and system 
metrics in the Dashboards within dtManage.   Refer back to the diagram in the 
App Data Tracker section, dtGateway relays queries and query results to and 
from the App Data Tracker.  In this way, dtManage sends queries and receives 
results from the App Data Tracker via dtGateway and uses the results to let the 
user visualize the data.
-
-Click on the visualize button in dtManage's application page.
-
-![AppDataTracker](images/autometrics/visualize.png)
-
-You will see the dashboard for the AutoMetrics and the system metrics.
-
-![AppDataTracker](images/autometrics/dashboard.png)
-
-The left widget shows the AutoMetrics of `line` and `count` for the 
LineReceiver operator.  The right widget shows the system metrics.
-
-The Dashboards have some simple builtin widgets to visualize the data.  Line 
charts and bar charts are some examples.
-Users will be able to implement their own widgets to visualize their data.

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/7af835b0/configuration_packages.md
----------------------------------------------------------------------
diff --git a/configuration_packages.md b/configuration_packages.md
deleted file mode 100644
index 30f1717..0000000
--- a/configuration_packages.md
+++ /dev/null
@@ -1,242 +0,0 @@
-Apache Apex Configuration Packages
-==================================
-
-An Apache Apex Application Configuration Package is a zip file that contains
-configuration files and additional files to be launched with an
-[Application Package](application_packages.md) using 
-DTCLI or REST API.  This guide assumes the readerâs familiarity of
-Application Package.  Please read the Application Package document to
-get yourself familiar with the concept first if you have not done so.
-
-#Requirements 
-
-You will need have the following installed:
-
-1. Apache Maven 3.0 or later (for assembling the Config Package)
-2. Apex 3.0.0 or later (for launching the App Package with the Config
-    Package in your cluster)
-
-#Creating Your First Configuration Package 
-
-You can create a Configuration Package using your Linux command line, or
-using your favorite IDE.  
-
-## Using Command Line
-
-First, change to the directory where you put your projects, and create a
-DT configuration project using Maven by running the following command.
- Replace "com.example", "mydtconfig" and "1.0-SNAPSHOT" with the
-appropriate values:
-
-    $ mvn archetype:generate \
-     -DarchetypeGroupId=org.apache.apex \
-     -DarchetypeArtifactId=apex-conf-archetype 
-DarchetypeVersion=3.2.0-incubating \
-     -DgroupId=com.example -Dpackage=com.example.mydtconfig 
-DartifactId=mydtconfig \
-     -Dversion=1.0-SNAPSHOT
-
-This creates a Maven project named "mydtconfig". Open it with your
-favorite IDE (e.g. NetBeans, Eclipse, IntelliJ IDEA).  Try it out by
-running the following command:
-
-```
-$ mvn package                                                         
-```
-
-The "mvn package" command creates the Config Package file in target
-directory as target/mydtconfig.apc. You will be able to use that
-Configuration Package file to launch an Apache Apex application.
-
-## Using IDE 
-
-Alternatively, you can do the above steps all within your IDE.  For
-example, in NetBeans, select File -\> New Project.  Then choose âMavenâ
-and âProject from Archetypeâ in the dialog box, as shown.
-
-![](images/AppConfig/ApplicationConfigurationPackages.html-image01.png)
-
-Then fill the Group ID, Artifact ID, Version and Repository entries as
-shown below.
-
-![](images/AppConfig/ApplicationConfigurationPackages.html-image02.png)
-
-Group ID: org.apache.apex
-Artifact ID: apex-conf-archetype
-Version: 3.2.0-incubating (or any later version)
-
-Press Next and fill out the rest of the required information. For
-example:
-
-![](images/AppConfig/ApplicationConfigurationPackages.html-image00.png)
-
-Click Finish, and now you have created your own Apex
-Configuration Package project.  The procedure for other IDEs, like
-Eclipse or IntelliJ, is similar.
-
-
-# Assembling your own configuration package 
-
-Inside the project created by the archetype, these are the files that
-you should know about when assembling your own configuration package:
-
-    ./pom.xml
-    ./src/main/resources/classpath
-    ./src/main/resources/files
-    ./src/main/resources/META-INF/properties.xml
-    ./src/main/resources/META-INF/properties-{appname}.xml
-
-## pom.xml 
-
-Example:
-
-```
-  <groupId>com.example</groupId>
-  <version>1.0.0</version>
-  <artifactId>mydtconf</artifactId>
-  <packaging>jar</packaging>
-  <!-- change these to the appropriate values -->
-  <name>My DataTorrent Application Configuration</name>
-  <description>My DataTorrent Application Configuration 
Description</description>
-  <properties>
-    <datatorrent.apppackage.name>mydtapp</datatorrent.apppackage.name>
-    
<datatorrent.apppackage.minversion>1.0.0</datatorrent.apppackage.minversion>
-   
<datatorrent.apppackage.maxversion>1.9999.9999</datatorrent.apppackage.maxversion>
-    <datatorrent.appconf.classpath>classpath/*</datatorrent.appconf.classpath>
-    <datatorrent.appconf.files>files/*</datatorrent.appconf.files>
-  </properties> 
-
-```
-In pom.xml, you can change the following keys to your desired values
-
-* ```<groupId>```
-* ```<version>```
-* ```<artifactId>```
-* ```<name> ```
-* ```<description>```
-
-You can also change the values of 
-
-* ```<datatorrent.apppackage.name>```
-* ```<datatorrent.apppackage.minversion>```
-* ```<datatorrent.apppackage.maxversion>```
-
-to reflect what app packages should be used with this configuration package.  
Apex will use this information to check whether a
-configuration package is compatible with the application package when you 
issue a launch command.
-
-## ./src/main/resources/classpath 
-
-Place any file in this directory that youâd like to be copied to the
-compute machines when launching an application and included in the
-classpath of the application.  Example of such files are Java properties
-files and jar files.
-
-## ./src/main/resources/files 
-
-Place any file in this directory that youâd like to be copied to the
-compute machines when launching an application but not included in the
-classpath of the application.
-
-## Properties XML file
-
-A properties xml file consists of a set of key-value pairs.  The set of
-key-value pairs specifies the configuration options the application
-should be launched with.  
-
-Example:
-```
-<configuration>
-  <property>
-    <name>some-property-name</name>
-    <value>some-property-value</value>
-  </property>
-   ...
-</configuration>
-```
-Names of properties XML file:
-
-*  **properties.xml:** Properties that are global to the Configuration
-Package
-*  **properties-{appName}.xml:** Properties that are specific when launching
-an application with the specified appName within the Application
-Package.
-
-After you are done with the above, remember to do mvn package to
-generate a new configuration package, which will be located in the
-target directory in your project.
-
-## Zip structure of configuration package 
-Apex Application Configuration Package files are zip files.  You
-can examine the content of any Application Configuration Package by
-using unzip -t on your Linux command line.  The structure of the zip
-file is as follow:
-
-```
-META-INF
-  MANIFEST.MF
-  properties.xml
-  properties-{appname}.xml
-classpath
-  {classpath files}
-files
-  {files} 
-```
-
-
-
-#Launching with CLI 
-
-`-conf` option of the launch command in CLI supports specifying configuration 
package in the local filesystem.  Example:
-
-    dt\> launch DTApp-mydtapp-1.0.0.jar -conf DTConfig-mydtconfig-1.0.0.jar
-
-This command expects both the application package and the configuration 
package to be in the local file system.
-
-
-
-# Related REST API 
-
-### POST /ws/v2/configPackages
-
-Payload: Raw content of configuration package zip
-
-Function: Creates or replace a configuration package zip file in HDFS
-
-Curl example:
-
-    $ curl -XPOST -T DTConfig-{name}.jar 
http://{yourhost:port}/ws/v2/configPackages
-
-### GET /ws/v2/configPackages?appPackageName=...&appPackageVersion=... 
-
-All query parameters are optional
-
-Function: Returns the configuration packages that the user is authorized to 
use and that are compatible with the specified appPackageName, 
appPackageVersion and appName. 
-
-### GET 
/ws/v2/configPackages/``<user>``?appPackageName=...&appPackageVersion=... 
-
-All query parameters are optional
-
-Function: Returns the configuration packages under the specified user and that 
are compatible with the specified appPackageName, appPackageVersion and appName.
-
-### GET /ws/v2/configPackages/```<user>```/```<name>``` 
-
-Function: Returns the information of the specified configuration package
-
-### GET /ws/v2/configPackages/```<user>```/```<name>```/download 
-
-Function: Returns the raw config package file
-
-Curl example:
-
-```sh
-$ curl http://{yourhost:port}/ws/v2/configPackages/{user}/{name}/download \> 
DTConfig-xyz.jar
-$ unzip -t DTConfig-xyz.jar
-```
-
-### POST 
/ws/v2/appPackages/```<user>```/```<app-pkg-name>```/```<app-pkg-version>```/applications/{app-name}/launch?configPackage=```<user>```/```<confpkgname>```
-
-Function: Launches the app package with the specified configuration package 
stored in HDFS.
-
-Curl example:
-
-```sh
-$ curl -XPOST -d â{}â 
http://{yourhost:port}/ws/v2/appPackages/{user}/{app-pkg-name}/{app-pkg-version}/applications/{app-name}/launch?configPackage={user}/{confpkgname}
-```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/7af835b0/docs/apex.md
----------------------------------------------------------------------
diff --git a/docs/apex.md b/docs/apex.md
new file mode 100644
index 0000000..215a957
--- /dev/null
+++ b/docs/apex.md
@@ -0,0 +1,14 @@
+Apache Apex
+================================================================================
+
+Apache Apex (incubating) is the industryâs only open source, 
enterprise-grade unified stream and batch processing engine.  Apache Apex 
includes key features requested by open source developer community that are not 
available in current open source technologies.
+
+* Event processing guarantees
+* In-memory performance & scalability
+* Fault tolerance and state management
+* Native rolling and tumbling window support
+* Hadoop-native YARN & HDFS implementation
+
+For additional information visit [Apache 
Apex](http://apex.incubator.apache.org/).
+
+[![](images/apex_logo.png)](http://apex.incubator.apache.org/)

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/7af835b0/docs/apex_development_setup.md
----------------------------------------------------------------------
diff --git a/docs/apex_development_setup.md b/docs/apex_development_setup.md
new file mode 100644
index 0000000..777f2f9
--- /dev/null
+++ b/docs/apex_development_setup.md
@@ -0,0 +1,151 @@
+Apache Apex Development Environment Setup
+=========================================
+
+This document discusses the steps needed for setting up a development 
environment for creating applications that run on the Apache Apex or the 
DataTorrent RTS streaming platform.
+
+
+Microsoft Windows
+------------------------------
+
+There are a few tools that will be helpful when developing Apache Apex 
applications, some required and some optional:
+
+1.  *git* -- A revision control system (version 1.7.1 or later). There are 
multiple git clients available for Windows (<http://git-scm.com/download/win> 
for example), so download and install a client of your choice.
+
+2.  *java JDK* (not JRE). Includes the Java Runtime Environment as well as the 
Java compiler and a variety of tools (version 1.7.0\_79 or later). Can be 
downloaded from the Oracle website.
+
+3.  *maven* -- Apache Maven is a build system for Java projects (version 3.0.5 
or later). It can be downloaded from <https://maven.apache.org/download.cgi>.
+
+4.  *VirtualBox* -- Oracle VirtualBox is a virtual machine manager (version 
4.3 or later) and can be downloaded from 
<https://www.virtualbox.org/wiki/Downloads>. It is needed to run the Data 
Torrent Sandbox.
+
+5.  *DataTorrent Sandbox* -- The sandbox can be downloaded from 
<https://www.datatorrent.com/download>. It is useful for testing simple 
applications since it contains Apache Hadoop and Data Torrent RTS 3.1.1 
pre-installed with a time-limited Enterprise License. If you already installed 
the RTS Enterprise Edition (evaluation or production license) on a cluster, you 
can use that setup for deployment and testing instead of the sandbox.
+
+6.  (Optional) If you prefer to use an IDE (Integrated Development 
Environment) such as *NetBeans*, *Eclipse* or *IntelliJ*, install that as well.
+
+
+After installing these tools, make sure that the directories containing the 
executable files are in your PATH environment; for example, for the JDK 
executables like _java_ and _javac_, the directory might be something like 
`C:\\Program Files\\Java\\jdk1.7.0\_80\\bin`; for _git_ it might be 
`C:\\Program Files\\Git\\bin`; and for maven it might be 
`C:\\Users\\user\\Software\\apache-maven-3.3.3\\bin`. Open a console window and 
enter the command:
+
+    echo %PATH%
+
+to see the value of the `PATH` variable and verify that the above directories 
are present. If not, you can change its value clicking on the button at 
_Control Panel_ &#x21e8; _Advanced System Settings_ &#x21e8; _Advanced tab_ 
&#x21e8; _Environment Variables_.
+
+
+Now run the following commands and ensure that the output is something similar 
to that shown in the table below:
+
+
+<table>
+<colgroup>
+<col width="30%" />
+<col width="70%" />
+</colgroup>
+<tbody>
+<tr class="odd">
+<td align="left"><p>Command</p></td>
+<td align="left"><p>Output</p></td>
+</tr>
+<tr class="even">
+<td align="left"><p><tt>javac -version</tt></p></td>
+<td align="left"><p>javac 1.7.0_80</p></td>
+</tr>
+<tr class="odd">
+<td align="left"><p><tt>java -version</tt></p></td>
+<td align="left"><p>java version &quot;1.7.0_80&quot;</p>
+<p>Java(TM) SE Runtime Environment (build 1.7.0_80-b15)</p>
+<p>Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)</p></td>
+</tr>
+<tr class="even">
+<td align="left"><p><tt>git --version</tt></p></td>
+<td align="left"><p>git version 2.6.1.windows.1</p></td>
+</tr>
+<tr class="odd">
+<td align="left"><p><tt>mvn --version</tt></p></td>
+<td align="left"><p>Apache Maven 3.3.3 
(7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T06:57:37-05:00)</p>
+<p>Maven home: C:\Users\ram\Software\apache-maven-3.3.3\bin\..</p>
+<p>Java version: 1.7.0_80, vendor: Oracle Corporation</p>
+<p>Java home: C:\Program Files\Java\jdk1.7.0_80\jre</p>
+<p>Default locale: en_US, platform encoding: Cp1252</p>
+<p>OS name: &quot;windows 8&quot;, version: &quot;6.2&quot;, arch: 
&quot;amd64&quot;, family: &quot;windows&quot;</p></td>
+</tr>
+</tbody>
+</table>
+
+
+To install the sandbox, first download it from 
<https://www.datatorrent.com/download> and import the downloaded file into 
VirtualBox. Once the import completes, you can select it and click the  Start 
button to start the sandbox.
+
+
+The sandbox is configured with 6GB RAM; if your development machine has 16GB 
or more, you can increase the sandbox RAM to 8GB or more using the VirtualBox 
console. This will yield better performance and support larger applications. 
Additionally, you can change the network adapter from **NAT** to **Bridged 
Adapter**; this will allow you to login to the sandbox from your host machine 
using an _ssh_ tool like **PuTTY** and also to transfer files to and from the 
host using `pscp` on Windows. Of course all such configuration must be done 
when when the sandbox is not running.
+
+
+You can choose to develop either directly on the sandbox or on your 
development machine. The advantage of the former is that most of the tools 
(e.g. _jdk_, _git_, _maven_) are pre-installed and also the package files 
created by your project are directly available to the Data Torrent tools such 
as  **dtManage** and **dtcli**. The disadvantage is that the sandbox is a 
memory-limited environment so running a memory-hungry tool like a Java IDE on 
it may starve other applications of memory.
+
+
+You can now use the maven archetype to create a basic Apache Apex project as 
follows: Put these lines in a Windows command file called, for example, 
`newapp.cmd` and run it:
+
+    @echo off
+    @rem Script for creating a new application
+    setlocal
+    mvn archetype:generate ^
+    
-DarchetypeRepository=https://www.datatorrent.com/maven/content/repositories/releases
 ^
+      -DarchetypeGroupId=com.datatorrent ^
+      -DarchetypeArtifactId=apex-app-archetype ^
+      -DarchetypeVersion=3.1.1 ^
+      -DgroupId=com.example ^
+      -Dpackage=com.example.myapexapp ^
+      -DartifactId=myapexapp ^
+      -Dversion=1.0-SNAPSHOT
+    endlocal
+
+
+
+The caret (^) at the end of some lines indicates that a continuation line 
follows. When you run this file, the properties will be displayed and you will 
be prompted with `` Y: :``; just press **Enter** to complete the project 
generation.
+
+
+This command file also exists in the Data Torrent _examples_ repository which 
you can check out with:
+
+    git clone https://github.com/DataTorrent/examples
+
+You will find the script under 
`examples\tutorials\topnwords\scripts\newapp.cmd`.
+
+You can also, if you prefer, use an IDE to generate the project as described 
in Section 3 of [Application Packages](application_packages.md) but use the 
archetype version 3.1.1 instead of 3.0.0.
+
+
+When the run completes successfully, you should see a new directory named 
`myapexapp` containing a maven project for building a basic Apache Apex 
application. It includes 3 source files:**Application.java**,  
**RandomNumberGenerator.java** and **ApplicationTest.java**. You can now build 
the application by stepping into the new directory and running the appropriate 
maven command:
+
+    cd myapexapp
+    mvn clean package -DskipTests
+
+The build should create the application package file 
`myapexapp\target\myapexapp-1.0-SNAPSHOT.apa`. This file can then be uploaded 
to the Data Torrent GUI tool on the sandbox (called **dtManage**) and launched  
from there. It generates a stream of random numbers and prints them out, each 
prefixed by the string  `hello world: `.  If you built this package on the 
host, you can transfer it to the sandbox using the `pscp` tool bundled with 
**PuTTY** mentioned earlier.
+
+
+If you want to checkout the Apache Apex source repositories and build them, 
you can do so by running the script `build-apex.cmd` located in the same place 
in the examples repository described above. The source repositories contain 
more substantial demo applications and the associated source code. 
Alternatively, if you do not want to use the script, you can follow these 
simple manual steps:
+
+
+1.  Check out the source code repositories:
+
+        git clone https://github.com/apache/incubator-apex-core
+        git clone https://github.com/apache/incubator-apex-malhar
+
+2.  Switch to the appropriate release branch and build each repository:
+
+        pushd incubator-apex-core
+        git checkout release-3.1
+        mvn clean install -DskipTests
+        popd
+        pushd incubator-apex-malhar
+        git checkout release-3.1
+        mvn clean install -DskipTests
+        popd
+
+The `install` argument to the `mvn` command installs resources from each 
project to your local maven repository (typically `.m2/repository` under your 
home directory), and **not** to the system directories, so Administrator 
privileges are not required. The  `-DskipTests` argument skips running unit 
tests since they take a long time. If this is a first-time installation, it 
might take several minutes to complete because maven will download a number of 
associated plugins.
+
+After the build completes, you should see the demo application package files 
in the target directory under each demo subdirectory in 
`incubator-apex-malhar\demos\`.
+
+Linux
+------------------
+
+Most of the instructions for Linux (and other Unix-like systems) are similar 
to those for Windows described above, so we will just note the differences.
+
+
+The pre-requisites (such as _git_, _maven_, etc.) are the same as for Windows 
described above; please run the commands in the table and ensure that 
appropriate versions are present in your PATH environment variable (the command 
to display that variable is: `echo $PATH`).
+
+
+The maven archetype command is the same except that continuation lines use a 
backslash (``\``) instead of caret (``^``); the script for it is available in 
the same location and is named `newapp` (without the `.cmd` extension). The 
script to checkout and build the Apache Apex repositories is named `build-apex`.

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/7af835b0/docs/apex_malhar.md
----------------------------------------------------------------------
diff --git a/docs/apex_malhar.md b/docs/apex_malhar.md
new file mode 100644
index 0000000..ef2e371
--- /dev/null
+++ b/docs/apex_malhar.md
@@ -0,0 +1,65 @@
+Apache Apex Malhar
+================================================================================
+
+Apache Apex Malhar is an open source operator and codec library that can be 
used with the Apache Apex platform to build real-time streaming applications.  
As part of enabling enterprises extract value quickly, Malhar operators help 
get data in, analyze it in real-time and get data out of Hadoop in real-time 
with no paradigm limitations.  In addition to the operators, the library 
contains a number of demos applications, demonstrating operator features and 
capabilities.
+
+![MalharDiagram](images/MalharOperatorOverview.png)
+
+# Capabilities common across Malhar operators
+
+For most streaming platforms, connectors are afterthoughts and often end up 
being simple âbolt-onsâ to the platform. As a result they often cause 
performance issues or data loss when put through failure scenarios and 
scalability requirements. Malhar operators do not face these issues as they 
were designed to be integral parts of apex*.md RTS. Hence, they have following 
core streaming runtime capabilities
+
+1.  **Fault tolerance** â Apache Apex Malhar operators where applicable have 
fault tolerance built in. They use the checkpoint capability provided by the 
framework to ensure that there is no data loss under ANY failure scenario.
+2.  **Processing guarantees** â Malhar operators where applicable provide 
out of the box support for ALL three processing guarantees â exactly once, 
at-least once & at-most once WITHOUT requiring the user to write any additional 
code.  Some operators like MQTT operator deal with source systems that cant 
track processed data and hence need the operators to keep track of the data. 
Malhar has support for a generic operator that uses alternate storage like HDFS 
to facilitate this. Finally for databases that support transactions or support 
any sort of atomic batch operations Malhar operators can do exactly once down 
to the tuple level.
+3.  **Dynamic updates** â Based on changing business conditions you often 
have to tweak several parameters used by the operators in your streaming 
application without incurring any application downtime. You can also change 
properties of a Malhar operator at runtime without having to bring down the 
application.
+4.  **Ease of extensibility** â Malhar operators are based on templates that 
are easy to extend.
+5.  **Partitioning support** â In streaming applications the input data 
stream often needs to be partitioned based on the contents of the stream. Also 
for operators that ingest data from external systems partitioning needs to be 
done based on the capabilities of the external system. E.g. With the Kafka or 
Flume operator, the operator can automatically scale up or down based on the 
changes in the number of Kafka partitions or Flume channels
+
+# Operator Library Overview
+
+## Input/output connectors
+
+Below is a summary of the various sub categories of input and output 
operators. Input operators also have a corresponding output operator
+
+*   **File Systems** â Most streaming analytics use cases we have seen 
require the data to be stored in HDFS or perhaps S3 if the application is 
running in AWS. Also, customers often need to re-run their streaming analytical 
applications against historical data or consume data from upstream processes 
that are perhaps writing to some NFS share. Hence, itâs not just enough to be 
able to save data to various file systems. You also have to be able to read 
data from them. RTS supports input & output operators for HDFS, S3, NFS & Local 
Files
+*   **Flume** â NOTE: Flume operator is not yet part of Malhar
+
+Many customers have existing Flume deployments that are being used to 
aggregate log data from variety of sources. However Flume does not allow 
analytics on the log data on the fly. The Flume input/output operator enables 
RTS to consume data from flume and analyze it in real-time before being 
persisted.
+
+*   **Relational databases** â Most stream processing use cases require some 
reference data lookups to enrich, tag or filter streaming data. There is also a 
need to save results of the streaming analytical computation to a database so 
an operational dashboard can see them. RTS supports a JDBC operator so you can 
read/write data from any JDBC compliant RDBMS like Oracle, MySQL etc.
+*   **NoSQL databases** âNoSQL key-value pair databases like Cassandra & 
HBase are becoming a common part of streaming analytics application 
architectures to lookup reference data or store results. Malhar has operators 
for HBase, Cassandra, Accumulo (common with govt. & healthcare companies) 
MongoDB & CouchDB.
+*   **Messaging systems** â JMS brokers have been the workhorses of 
messaging infrastructure in most enterprises. Also Kafka is fast coming up in 
almost every customer we talk to. Malhar has operators to read/write to Kafka, 
any JMS implementation, ZeroMQ & RabbitMQ.
+*   **Notification systems** â Almost every streaming analytics application 
has some notification requirements that are tied to a business condition being 
triggered. Malhar supports sending notifications via SMTP & SNMP. It also has 
an alert escalation mechanism built in so users donât get spammed by 
notifications (a common drawback in most streaming platforms)
+*   **In-memory Databases & Caching platforms** - Some streaming use cases 
need instantaneous access to shared state across the application. Caching 
platforms and in-memory databases serve this purpose really well. To support 
these use cases, Malhar has operators for memcached & Redis
+*   **Protocols** - Streaming use cases driven by machine-to-machine 
communication have one thing in common â there is no standard dominant 
protocol being used for communication. Malhar currently has support for MQTT. 
It is one of the more commonly, adopted protocols we see in the IoT space. 
Malhar also provides connectors that can directly talk to HTTP, RSS, Socket, 
WebSocket & FTP sources
+
+
+
+## Compute
+
+One of the most important promises of a streaming analytics platform like 
Apache Apex is the ability to do analytics in real-time. However delivering on 
the promise becomes really difficult when the platform does not provide out of 
the box operators to support variety of common compute functions as the user 
then has to worry about making these scalable, fault tolerant etc. Malhar takes 
this responsibility away from the application developer by providing a huge 
variety of out of the box computational operators. The application developer 
can thus focus on the analysis.
+
+Below is just a snapshot of the compute operators available in Malhar
+
+*   Statistics & Math - Provide various mathematical and statistical 
computations over application defined time windows.
+*   Filtering & pattern matching
+*   Machine learning & Algorithms
+*   Real-time model scoring is a very common use case for stream processing 
platforms. &nbsp;Malhar allows users to invoke their R models from streaming 
applications
+*   Sorting, Maps, Frequency, TopN, BottomN, Random Generator etc.
+
+
+## Query & Script invocation
+
+Many streaming use cases are legacy implementations that need to be ported 
over. This often requires re-use some of the existing investments and code that 
perhaps would be really hard to re-write. With this in mind, Malhar supports 
invoking external scripts and queries as part of the streaming application 
using operators for invoking SQL query, Shell script, Ruby, Jython, and 
JavaScript etc.
+
+## Parsers
+
+There are many industry vertical specific data formats that a streaming 
application developer might need to parse. Often there are existing parsers 
available for these that can be directly plugged into an Apache Apex 
application. For example in the Telco space, a Java based CDR parser can be 
directly plugged into Apache Apex operator. To further simplify development 
experience, Malhar also provides some operators for parsing common formats like 
XML (DOM & SAX), JSON (flat map converter), Apache log files, syslog, etc.
+
+## Stream manipulation
+
+Streaming data aka âstreamâ is raw data that inevitably needs processing 
to clean, filter, tag, summarize etc. The goal of Malhar is to enable the 
application developer to focus on âWHATâ needs to be done to the stream to 
get it in the right format and not worry about the âHOWâ. Hence, Malhar has 
several operators to perform the common stream manipulation actions like â 
DeDupe, GroupBy, Join, Distinct/Unique, Limit, OrderBy, Split, Sample, Inner 
join, Outer join, Select, Update etc.
+
+## Social Media
+
+Malhar includes an operator to connect to the popular Twitter stream fire hose.

[04/11] incubator-apex-core git commit: Migrating docs

Reply via email to