Repository: incubator-samoa
Updated Branches:
  refs/heads/gh-pages 09937b790 -> 7acb1c475


http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Executing-SAMOA-with-Apache-Samza.md
----------------------------------------------------------------------
diff --git a/documentation/Executing-SAMOA-with-Apache-Samza.md 
b/documentation/Executing-SAMOA-with-Apache-Samza.md
new file mode 100644
index 0000000..c0f45a9
--- /dev/null
+++ b/documentation/Executing-SAMOA-with-Apache-Samza.md
@@ -0,0 +1,290 @@
+---
+title: Executing Apache SAMOA with Apache Samza
+layout: documentation
+documentation: true
+---
+This tutorial describes how to run SAMOA on Apache Samza.
+The steps included in this tutorial are:
+
+1. Setup and configure a cluster with the required dependencies. This applies 
for single-node (local) execution as well.
+
+2. Build SAMOA deployables
+
+3. Configure SAMOA-Samza
+
+4. Deploy SAMOA-Samza and execute a task
+
+5. Observe the execution and the result
+
+## Setup cluster
+The following are needed to to run SAMOA on top of Samza:
+
+* [Apache Zookeeper](http://zookeeper.apache.org/)
+* [Apache Kafka](http://kafka.apache.org/)
+* [Apache Hadoop YARN and 
HDFS](http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html)
+
+### Zookeeper
+Zookeeper is used by Kafka to coordinate its brokers. The detail instructions 
to setup a Zookeeper cluster can be found 
[here](http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html). 
+
+To quickly setup a single-node Zookeeper cluster:
+
+1. Download the binary release from the [release 
page](http://zookeeper.apache.org/releases.html).
+
+2. Untar the archive
+ 
+```
+tar -xf $DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/
+```
+
+3. Copy the default configuration file
+
+```
+cp zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg
+```
+
+4. Start the single-node cluster
+
+```
+~/zookeeper-3.4.6/bin/zkServer.sh start
+```
+
+### Kafka
+Kafka is a distributed, partitioned, replicated commit log service which Samza 
uses as its default messaging system. 
+
+1. Download a binary release of Kafka 
[here](http://kafka.apache.org/downloads.html). As mentioned in the page, the 
Scala version does not matter. However, 2.10 is recommended as Samza has 
recently been moved to Scala 2.10.
+
+2. Untar the archive 
+
+```
+tar -xzf $DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/
+```
+
+If you are running in local mode or a single-node cluster, you can now start 
Kafka with the command:
+
+```
+~/kafka_2.10-0.8.1/bin/kafka-server-start.sh 
kafka_2.10-0.8.1/config/server.properties
+```
+
+In multi-node cluster, it is typical and convenient to have a Kafka broker on 
each node (although you can totally have a smaller Kafka cluster, or even a 
single-node Kafka cluster). The number of brokers in Kafka cluster will affect 
disk bandwidth and space (the more brokers we have, the higher value we will 
get for the two). In each node, you need to set the following properties in 
`~/kafka_2.10-0.8.1/config/server.properties` before starting Kafka service.
+
+``` 
+broker.id=a-unique-number-for-each-node
+zookeeper.connect=zookeeper-host0-url:2181[,zookeeper-host1-url:2181,...]
+```
+
+You might want to change the retention hours or retention bytes of the logs to 
avoid the logs size from growing too big.
+
+```
+log.retention.hours=number-of-hours-to-keep-the-logs
+log.retention.bytes=number-of-bytes-to-keep-in-the-logs
+```
+
+### Hadoop YARN and HDFS
+> Hadoop YARN and HDFS are **not** required to run SAMOA in Samza local mode. 
+
+To set up a YARN cluster, first download a binary release of Hadoop 
[here](http://www.apache.org/dyn/closer.cgi/hadoop/common/) on each node in the 
cluster and untar the archive
+`tar -xf $DOWNLOAD_DIR/hadoop-2.2.0.tar.gz -C ~/`. We have tested SAMOA with 
Hadoop 2.2.0 but Hadoop 2.3.0 should work too.
+
+**HDFS**
+
+Set the following properties in `~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml` in 
all nodes.
+
+```
+<configuration>
+  <property>
+    <name>dfs.datanode.data.dir</name>
+    <value>file:///home/username/hadoop-2.2.0/hdfs/datanode</value>
+    <description>Comma separated list of paths on the local filesystem of a 
DataNode where it should store its blocks.</description>
+  </property>
+ 
+  <property>
+    <name>dfs.namenode.name.dir</name>
+    <value>file:///home/username/hadoop-2.2.0/hdfs/namenode</value>
+    <description>Path on the local filesystem where the NameNode stores the 
namespace and transaction logs persistently.</description>
+  </property>
+</configuration>
+```
+
+Add this property in `~/hadoop-2.2.0/etc/hadoop/core-site.xml` in all nodes.
+
+```
+<configuration>
+  <property>
+    <name>fs.defaultFS</name>
+    <value>hdfs://localhost:9000/</value>
+    <description>NameNode URI</description>
+  </property>
+
+  <property>
+    <name>fs.hdfs.impl</name>
+    <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
+  </property>
+</configuration>
+```
+For a multi-node cluster, change the hostname ("localhost") to the correct 
host name of your namenode server.
+
+Format HDFS directory (only perform this if you are running it for the very 
first time)
+
+```
+~/hadoop-2.2.0/bin/hdfs namenode -format
+```
+
+Start namenode daemon on one of the node
+
+```
+~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode
+```
+
+Start datanode daemon on all nodes
+
+```
+~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode
+```
+
+**YARN**
+
+If you are running in multi-node cluster, set the resource manager hostname in 
`~/hadoop-2.2.0/etc/hadoop/yarn-site.xml` in all nodes as follow:
+
+```
+<configuration>
+  <property>
+    <name>yarn.resourcemanager.hostname</name>
+    <value>resourcemanager-url</value>
+    <description>The hostname of the RM.</description>
+  </property>
+</configuration>
+```
+
+**Other configurations**
+Now we need to tell Samza where to find the configuration of YARN cluster. To 
do this, first create a new directory in all nodes:
+
+```
+mkdir ~/.samza
+mkdir ~/.samza/conf
+```
+
+Copy (or soft link) `core-site.xml`, `hdfs-site.xml`, `yarn-site.xml` in 
`~/hadoop-2.2.0/etc/hadoop` to the new directory 
+
+```
+ln -s ~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml
+ln -s ~/.samza/conf/hdfs-site.xml ~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
+ln -s ~/.samza/conf/yarn-site.xml ~/hadoop-2.2.0/etc/hadoop/yarn-site.xml
+```
+
+Export the enviroment variable YARN_HOME (in ~/.bashrc) so Samza knows where 
to find these YARN configuration files.
+
+```
+export YARN_HOME=$HOME/.samza
+```
+
+**Start the YARN cluster**
+Start resource manager on master node
+
+```
+~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager
+```
+
+Start node manager on all worker nodes
+
+```
+~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager
+```
+
+## Build SAMOA
+Perform the following step on one of the node in the cluster. Here we assume 
git and maven are installed on this node.
+
+Since Samza is not yet released on Maven, we will have to clone Samza project, 
build and publish to Maven local repository:
+
+```
+git clone -b 0.7.0 https://github.com/apache/incubator-samza.git
+cd incubator-samza
+./gradlew clean build
+./gradlew publishToMavenLocal
+```
+ 
+Here we cloned and installed Samza version 0.7.0, the current released version 
(July 2014). 
+
+Now we can clone the repository and install SAMOA.
+
+```
+git clone http://git.apache.org/incubator-samoa.git
+cd incubator-samoa
+mvn -Psamza package
+```
+
+The deployable jars for SAMOA will be in 
`target/SAMOA-<variant>-<version>-SNAPSHOT.jar`. For example, in our case for 
Samza `target/SAMOA-Samza-0.2.0-SNAPSHOT.jar`.
+
+## Configure SAMOA-Samza execution
+This section explains the configuration parameters in 
`bin/samoa-samza.properties` that are required to run SAMOA on top of Samza.
+
+**Samza execution mode**
+
+```
+samoa.samza.mode=[yarn|local]
+```
+This parameter specify which mode to execute the task: `local` for local 
execution and `yarn` for cluster execution.
+
+**Zookeeper**
+
+```
+zookeeper.connect=localhost
+zookeeper.port=2181
+```
+The default setting above applies for local mode execution. For cluster mode, 
change `zookeeper.host` to the correct URL of your zookeeper host.
+
+**Kafka**
+
+```
+kafka.broker.list=localhost:9092
+```
+`kafka.broker.list` is a comma separated list of host:port of all the brokers 
in Kafka cluster.
+
+```
+kafka.replication.factor=1
+```
+`kafka.replication.factor` specifies the number of replicas for each stream in 
Kafka. This number must be less than or equal to the number of brokers in Kafka 
cluster.
+
+**YARN**
+> The below settings do not apply for local mode execution, you can leave them 
as they are.
+
+`yarn.am.memory` and `yarn.container.memory` specify the memory requirement 
for the Application Master container and the worker containers, respectively. 
+
+```
+yarn.am.memory=1024
+yarn.container.memory=1024
+```
+
+`yarn.package.path` specifies the path (typically a HDFS path) of the package 
to be distributed to all YARN containers to execute the task.
+
+```
+yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar
+```
+
+**Samza**
+`max.pi.per.container` specifies the number of PI instances allowed in one 
YARN container. 
+
+```
+max.pi.per.container=1
+```
+
+`kryo.register.file` specifies the registration file for Kryo serializer.
+
+```
+kryo.register.file=samza-kryo
+```
+
+`checkpoint.commit.ms` specifies the frequency for PIs to commit their 
checkpoints (in ms). The default value is 1 minute.
+
+```
+checkpoint.commit.ms=60000
+```
+
+## Deploy SAMOA-Samza task
+Execute SAMOA task with the following command:
+
+```
+bin/samoa samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar "<task> & <options>" 
+```
+
+## Observe execution and result
+In local mode, all the log will be printed out to stdout. If you execute the 
task on YARN cluster, the output is written to stdout files in YARN's 
containers' log folder 
($HADOOP_HOME/logs/userlogs/application_\<application-id\>/container_\<container-id\>).

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Executing-SAMOA-with-Apache-Storm.md
----------------------------------------------------------------------
diff --git a/documentation/Executing-SAMOA-with-Apache-Storm.md 
b/documentation/Executing-SAMOA-with-Apache-Storm.md
new file mode 100644
index 0000000..0fcdea2
--- /dev/null
+++ b/documentation/Executing-SAMOA-with-Apache-Storm.md
@@ -0,0 +1,100 @@
+---
+title: Executing Apache SAMOA with Apache Storm
+layout: documentation
+documentation: true
+---
+In this tutorial page we describe how to execute SAMOA on top of Apache Storm. 
Here is an outline of what we want to do:
+
+1. Ensure that you have necessary Storm cluster and configuration to execute 
SAMOA
+2. Ensure that you have all the SAMOA deployables for execution in the cluster
+3. Configure samoa-storm.properties
+4. Execute SAMOA classification task
+5. Observe the task execution
+
+### Storm Configuration
+Before we start the tutorial, please ensure that you already have Storm 
cluster (preferably Storm 0.8.2) running. You can follow this 
[tutorial](http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/)
 to set up a Storm cluster.
+
+You also need to install Storm at the machine where you initiate the 
deployment, and configure Storm (at least) with this configuration in 
`~/.storm/storm.yaml`:
+
+```
+########### These MUST be filled in for a storm configuration
+nimbus.host: "<enter your nimbus host name here>"
+
+## List of custom serializations
+kryo.register:
+    - com.yahoo.labs.samoa.learners.classifiers.trees.AttributeContentEvent: 
com.yahoo.labs.samoa.learners.classifiers.trees.AttributeContentEvent$AttributeCEFullPrecSerializer
+    - com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent: 
com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer
+```
+<!--
+Or, if you are using SAMOA with optimized VHT, you should use this following 
configuration file:
+```
+########### These MUST be filled in for a storm configuration
+nimbus.host: "<enter your nimbus host name here>"
+
+## List of custom serializations
+kryo.register:
+     - 
com.yahoo.labs.samoa.learners.classifiers.trees.NaiveAttributeContentEvent: 
com.yahoo.labs.samoa.classifiers.trees.NaiveAttributeContentEvent$NaiveAttributeCEFullPrecSerializer
+     - com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent: 
com.yahoo.labs.samoa.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer
+```
+-->
+
+Alternatively, if you don't have Storm cluster running, you can execute SAMOA 
with Storm in local mode as explained in section [samoa-storm.properties 
Configuration](#samoa-storm-properties).
+
+### SAMOA deployables
+There are three deployables for executing SAMOA on top of Storm. They are:
+
+1. `bin/samoa` is the main script to execute SAMOA. You do not need to change 
anything in this script.
+2. `target/SAMOA-Storm-x.x.x-SNAPSHOT.jar` is the deployed jar file. `x.x.x` 
is the version number of SAMOA. 
+3. `bin/samoa-storm.properties` contains deployment configurations. You need 
to set the parameters in this properties file correctly. 
+
+### <a name="samoa-storm-properties"> samoa-storm.properties Configuration</a>
+Currently, the properties file contains two configurations:
+
+1. `samoa.storm.mode` determines whether the task is executed locally (using 
Storm's `LocalCluster`) or executed in a Storm cluster. Use `local` if you want 
to test SAMOA and you do not have a Storm cluster for deployment. Use `cluster` 
if you want to test SAMOA on your Storm cluster.
+2. `samoa.storm.numworker` determines the number of worker to execute the 
SAMOA tasks in the Storm cluster. This field must be an integer, less than or 
equal to the number of available slots in you Storm cluster. If you are using 
local mode, this property corresponds to the number of thread used by Storm's 
LocalCluster to execute your SAMOA task.
+
+Here is the example of a complete properties file:
+
+```
+# SAMOA Storm properties file
+# This file contains specific configurations for SAMOA deployment in the Storm 
platform
+# Note that you still need to configure Storm client in your machine, 
+# including setting up Storm configuration file (~/.storm/storm.yaml) with 
correct settings
+
+# samoa.storm.mode corresponds to the execution mode of the Task in Storm 
+# possible values:
+#   1. cluster: the Task will be sent into nimbus. The nimbus is configured by 
Storm configuration file
+#   2. local: the Task will be sent using local Storm cluster
+samoa.storm.mode=cluster
+
+# samoa.storm.numworker corresponds to the number of worker processes 
allocated in Storm cluster
+# possible values: any integer greater than 0  
+samoa.storm.numworker=7
+```
+
+### SAMOA task execution
+
+You can execute a SAMOA task using the aforementioned `bin/samoa` script with 
this following format:
+`bin/samoa <platform> <jar> "<task>"`.
+
+`<platform>` can be `storm` or `s4`. Using `storm` option means you are 
deploying SAMOA on a Storm environment. In this configuration, the script uses 
the aforementioned yaml file (`~/.storm/storm.yaml`) and 
`samoa-storm.properties` to perform the deployment. Using `s4` option means you 
are deploying SAMOA on an Apache S4 environment. Follow this 
[link](Executing-SAMOA-with-Apache-S4) to learn more about deploying SAMOA on 
Apache S4.
+
+`<jar>` is the location of the deployed jar file 
(`SAMOA-Storm-x.x.x-SNAPSHOT.jar`) in your file system. The location can be a 
relative path or an absolute path into the jar file. 
+
+`"<task>"` is the SAMOA task command line such as `PrequentialEvaluation` or 
`ClusteringTask`. This command line for SAMOA task follows the format of 
[Massive Online Analysis 
(MOA)](http://moa.cms.waikato.ac.nz/details/classification/command-line/).
+
+The complete command to execute SAMOA is:
+
+```
+bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation 
-d /tmp/dump.csv -i 1000000 -f 100000 -l 
(com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s 
(com.yahoo.labs.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u 
10)"
+```
+The example above uses [Prequential Evaluation 
task](Prequential-Evaluation-Task) and [Vertical Hoeffding 
Tree](Vertical-Hoeffding-Tree-Classifier) classifier. 
+
+### Observing task execution
+There are two ways to observe the task execution using Storm UI and by 
monitoring the dump file of the SAMOA task. Notice that the dump file will be 
created on the cluster if you are executing your task in `cluster` mode.
+
+#### Using Storm UI
+Go to the web address of Storm UI and check whether the SAMOA task executes as 
intended. Use this UI to kill the associated Storm topology if necessary.
+
+#### Monitoring the dump file
+Several tasks have options to specify a dump file, which is a file that 
represents the task output. In our example, [Prequential Evaluation 
task](Prequential-Evaluation-Task) has `-d` option which specifies the path to 
the dump file. Since Storm performs the allocation of Storm tasks, you should 
set the dump file into a file on a shared filesystem if you want to access it 
from the machine submitting the task.

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Getting-Started.md
----------------------------------------------------------------------
diff --git a/documentation/Getting-Started.md b/documentation/Getting-Started.md
new file mode 100644
index 0000000..99e80b3
--- /dev/null
+++ b/documentation/Getting-Started.md
@@ -0,0 +1,32 @@
+---
+title: Getting Started
+layout: documentation
+documentation: true
+---
+We start showing how simple is to run a first large scale machine learning 
task in SAMOA. We will evaluate a bagging ensemble method using decision trees 
on the Forest Covertype dataset.
+
+* 1. Download SAMOA 
+
+```bash
+git clone http://git.apache.org/incubator-samoa.git
+cd incubator-samoa
+mvn package      #Local mode
+```
+* 2. Download the Forest CoverType dataset 
+
+```bash
+wget 
"http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip";
+unzip covtypeNorm.arff.zip 
+```
+
+_Forest Covertype_ contains the forest cover type for 30 x 30 meter cells 
obtained from the US Forest Service (USFS) Region 2 Resource Information System 
(RIS) data. It contains 581,012 instances and 54 attributes, and it has been 
used in several articles on data stream classification.
+
+* 3.  Run an example: classifying the CoverType dataset with the bagging 
algorithm
+
+```bash
+bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar "PrequentialEvaluation 
-l classifiers.ensemble.Bagging 
+    -s (ArffFileStream -f covtypeNorm.arff) -f 100000"
+```
+
+
+The output will be a list of the evaluation results, plotted each 100,000 
instances.

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Home.md
----------------------------------------------------------------------
diff --git a/documentation/Home.md b/documentation/Home.md
new file mode 100644
index 0000000..ebc3475
--- /dev/null
+++ b/documentation/Home.md
@@ -0,0 +1,57 @@
+---
+title: Apache SAMOA Documentation
+layout: documentation
+documentation: true
+---
+Apache SAMOA is a distributed realtime machine learning system, similar to 
Mahout, but specific designed for stream mining. Apache SAMOA is simple and fun 
to use!
+
+This documentation is intended to give an introduction on how to use Apache 
SAMOA in different ways. As a user you can run Apache SAMOA algorithms into 
several Stream Processing Engines: local mode, Apache Storm, S4 and Samza. As a 
developer you can create new algorithms only once and test them in all of these 
Stream Processing Engines.
+
+## Getting Started
+
+* [0 Hands-on with SAMOA: Getting Started!](Getting-Started.html)
+
+
+## Users
+
+* [1 Building and Executing 
SAMOA](Scalable-Advanced-Massive-Online-Analysis.html)
+    * [1.0 Building SAMOA](Building-SAMOA.html)
+    * [1.1 Executing SAMOA with Apache 
Storm](Executing-SAMOA-with-Apache-Storm.html)
+    * [1.2 Executing SAMOA with Apache S4](Executing-SAMOA-with-Apache-S4.html)
+    * [1.3 Executing SAMOA with Apache 
Samza](Executing-SAMOA-with-Apache-Samza.html)
+* [2 Machine Learning Methods in SAMOA](SAMOA-and-Machine-Learning.html)
+    * [2.1 Prequential Evaluation Task](Prequential-Evaluation-Task.html)
+    * [2.2 Vertical Hoeffding Tree 
Classifier](Vertical-Hoeffding-Tree-Classifier.html)
+    * [2.3 Adaptive Model Rules Regressor](Adaptive-Model-Rules-Regressor.html)
+    * [2.4 Bagging and Boosting](Bagging-and-Boosting.html)
+    * [2.5 Distributed Stream Clustering](Distributed-Stream-Clustering.html)
+    * [2.6 Distributed Stream Frequent Itemset 
Mining](Distributed-Stream-Frequent-Itemset-Mining.html)
+    * [2.7 SAMOA for MOA users](SAMOA-for-MOA-users.html)
+
+## Developers
+
+* [3 Understanding SAMOA Topologies](SAMOA-Topology.html)
+    * [3.1 Processor](Processor.html)
+    * [3.2 Content Event](Content-Event.html)
+    * [3.3 Stream](Stream.html)
+    * [3.4 Task](Task.html)
+    * [3.5 Topology Builder](Topology-Builder.html)
+    * [3.6 Learner](Learner.html)
+    * [3.7 Processing Item](Processing-Item.html)
+* [4 Developing New Tasks in SAMOA](Developing-New-Tasks-in-SAMOA.html)
+
+### Getting help
+
+#### Apache SAMOA Users
+Samoa users should send messages and subscribe to 
[[email protected]](mailto:[email protected]).
+
+You can subscribe to this list by sending an email to 
[[email protected]](mailto:[email protected]).
 Likewise, you can cancel a subscription by sending an email to 
[[email protected]](mailto:[email protected]).
+
+
+#### Apache SAMOA Developers
+Storm developers should send messages and subscribe to 
[[email protected]](mailto:[email protected]).
+
+You can subscribe to this list by sending an email to 
[[email protected]](mailto:[email protected]). 
Likewise, you can cancel a subscription by sending an email to 
[[email protected]](mailto:[email protected]).
+
+__NOTE:__ The google groups account [email protected] is now 
officially deprecated in favor of the Apache-hosted user/dev mailing lists.
+

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Learner.md
----------------------------------------------------------------------
diff --git a/documentation/Learner.md b/documentation/Learner.md
new file mode 100644
index 0000000..f73c47a
--- /dev/null
+++ b/documentation/Learner.md
@@ -0,0 +1,20 @@
+---
+title: Learner
+layout: documentation
+documentation: true
+---
+Learners are implemented in SAMOA as sub-topologies.
+
+```
+public interface Learner extends Serializable{
+       
+       public void init(TopologyBuilder topologyBuilder, Instances dataset);
+
+       public Processor getInputProcessor();
+
+       public Stream getResultStream();
+}
+```
+When a `Task` object is initiated via `init()`, the method `init(...)` of 
`Learner` is called, and the topology is added to the global topology of the 
task.
+
+To create a new learner, it is only needed to add streams, processors and 
their connections to the topology in `init(...)`, specify what is the processor 
that will manage the input stream of the learner in `getInputProcessor()`, and 
finally, specify what is going to be the output stream of the learner with 
`getResultStream()`.

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Prequential-Evaluation-Task.md
----------------------------------------------------------------------
diff --git a/documentation/Prequential-Evaluation-Task.md 
b/documentation/Prequential-Evaluation-Task.md
new file mode 100644
index 0000000..3322218
--- /dev/null
+++ b/documentation/Prequential-Evaluation-Task.md
@@ -0,0 +1,27 @@
+---
+title: Prequential Evaluation
+layout: documentation
+documentation: true
+---
+In data stream mining, the most used evaluation scheme is the prequential or 
interleaved-test-then-train evolution. The idea is very simple: we use each 
instance first to test the model, and then to train the model. The Prequential 
Evaluation task evaluates the performance of online classifiers doing this. It 
supports two classification performance evaluators: the basic one which 
measures the accuracy of the classifier model since the start of the 
evaluation, and a window based one which measures the accuracy on the current 
sliding window of recent instances. 
+
+Examples of Prequential Evaluation task in SAMOA command line when deploying 
into Storm
+
+
+```
+bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation 
-d /tmp/dump.csv -i 1000000 -f 100000 -l 
(classifiers.trees.VerticalHoeffdingTree -p 4) -s 
(generators.RandomTreeGenerator -c 2 -o 10 -u 10)"
+```
+
+Parameters:
+
+* `-l`: classifier to train
+* `-s`: stream to learn from
+* `-e`: classification performance evaluation method
+* `-i`: maximum number of instances to test/train on (-1 = no limit)
+* `-f`: number of instances between samples of the learning performance
+* `-n`: evaluation name (default: PrequentialEvaluation_TimeStamp)
+* `-d`: file to append intermediate csv results to
+
+In terms of SAMOA API, the Prequential Evaluation Task consists of a source 
`Entrance Processor`, a `Classifier`, and an `Evaluator Processor` as shown 
below. The `Entrance Processor` sends instances to the `Classifier` using the 
`source` stream. The classifier sends the classification results to the 
`Evaluator Processor` via the `result` stream. The `Entrance Processor` 
corresponds to the `-s` option of Prequential Evaluation, the `Classifier` 
corresponds to the `-l` option, and the `Evaluator Processor` corresponds to 
the `-e` option.
+ 
+![Prequential Evaluation Task](images/PrequentialEvaluation.png)

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Processing-Item.md
----------------------------------------------------------------------
diff --git a/documentation/Processing-Item.md b/documentation/Processing-Item.md
new file mode 100644
index 0000000..d118ab5
--- /dev/null
+++ b/documentation/Processing-Item.md
@@ -0,0 +1,38 @@
+---
+title: Processing Item
+layout: documentation
+documentation: true
+---
+Processing Item is a hidden physical unit of the topology and is just a 
wrapper of Processor.
+It is used internally, and it is not accessible from the API.
+
+### Advanced 
+
+It does not contain any logic but connects the Processor to the other 
processors in the topology.
+There are two types of Processing Items.
+
+1. Simple Processing Item (PI)
+2. Entrance Processing Item (EntrancePI)
+
+#### 1. Simple Processing Item (PI)
+Once a Processor is wrapped in a PI, it becomes an executable component of the 
topology. All physical topology units are created with the help of a 
`TopologyBuilder`. Following code snippet shows the creation of a Processing 
Item.
+
+```
+builder.initTopology("MyTopology");
+Processor samplerProcessor = new Sampler();
+ProcessingItem samplerPI = builder.createPI(samplerProcessor,3);
+```
+The `createPI()` method of `TopologyBuilder` is used to create a PI. Its first 
argument is the instance of a Processor which needs to be wrapped-in. Its 
second argument is the parallelism hint. It tells the underlying platforms how 
many parallel instances of this PI should be created on different nodes.
+
+#### 2. Entrance Processing Item (EntrancePI)
+Entrance Processing Item is different from a PI in only one way: it accepts an 
Entrance Processor which can generate its own stream.
+It is mostly used as the source of a topology.
+It connects to external sources, pulls data and provides it to the topology in 
the form of streams.
+All physical topology units are created with the help of a `TopologyBuilder`.
+The following code snippet shows the creation of an Entrance Processing Item.
+
+```
+builder.initTopology("MyTopology");
+EntranceProcessor sourceProcessor = new Source();
+EntranceProcessingItem sourcePi = builder.createEntrancePi(sourceProcessor);
+```

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Processor.md
----------------------------------------------------------------------
diff --git a/documentation/Processor.md b/documentation/Processor.md
new file mode 100644
index 0000000..8891cd7
--- /dev/null
+++ b/documentation/Processor.md
@@ -0,0 +1,71 @@
+---
+title: Processor
+layout: documentation
+documentation: true
+---
+Processor is the basic logical processing unit. All logic is written in the 
processor. In SAMOA, a Processor is an interface. Users can implement this 
interface to build their own processors.
+![Topology](images/Topology.png)
+### Adding a Processor to the topology
+
+There are two ways to add a processor to the topology.
+
+#### 1. Processor
+All physical topology units are created with the help of a `TopologyBuilder`. 
Following code snippet shows how to add a Processor to the topology.
+```
+Processor processor = new ExampleProcessor();
+builder.addProcessor(processor, paralellism);
+```
+`addProcessor()` method of `TopologyBuilder` is used to add the processor. Its 
first argument is the instance of a Processor which needs to be added. Its 
second argument is the parallelism hint. It tells the underlying platforms how 
many parallel instances of this processor should be created on different nodes.
+
+#### 2. Entrance Processor
+Some processors generates their own streams, and they are used as the source 
of a topology. They connect to external sources, pull data and provide it to 
the topology in the form of streams.
+All physical topology units are created with the help of a `TopologyBuilder`. 
The following code snippet shows how to add an entrance processor to the 
topology and create a stream from it.
+```
+EntranceProcessor entranceProcessor = new EntranceProcessor();
+builder.addEntranceProcessor(entranceProcessor);
+Stream source = builder.createStream(entranceProcessor);
+```
+
+### Preview of Processor
+```
+package samoa.core;
+public interface Processor extends java.io.Serializable{
+       boolean process(ContentEvent event);
+       void onCreate(int id);
+       Processor newProcessor(Processor p);
+}
+```
+### Methods
+
+#### 1. `boolean process(ContentEvent event)`
+Users should implement the three methods shown above. `process(ContentEvent 
event)` is the method in which all processing logic should be implemented. 
`ContentEvent` is a type (interface) which contains the event. This method will 
be called each time a new event is received. It should return `true` if the 
event has been correctly processed, `false` otherwise.
+
+#### 2. `void onCreate(int id)` 
+is the method in which all initialization code should be written. Multiple 
copies/instances of the Processor are created based on the parallelism hint 
specified by the user. SAMOA assigns each instance a unique id which is passed 
as a parameter `id` to `onCreate(int it)` method of each instance.
+
+#### 3. `Processor newProcessor(Processor p)` 
+is very simple to implement. This method is just a technical overhead that has 
no logical use except that it helps SAMOA in some of its internals. Users 
should just return a new copy of the instance of this class which implements 
this Processor interface. 
+
+### Preview of EntranceProcessor
+```
+package com.yahoo.labs.samoa.core;
+
+public interface EntranceProcessor extends Processor {
+    public boolean isFinished();
+    public boolean hasNext();
+    public ContentEvent nextEvent();
+}
+```
+### Methods
+
+#### 1. `boolean isFinished()`
+returns whether to expect more events coming from the entrance processor. If 
the source is a live stream this method should return always `false`. If the 
source is a file, the method should return `true` once the file has been fully 
processed.
+
+#### 2. `boolean hasNext()` 
+returns whether the next event is ready for consumption. If the method returns 
`true` a subsequent call to `nextEvent` should yield the next event to be 
processed. If the method returns `false` the engine can use this information to 
avoid continuously polling the entrance processor.
+
+#### 3. `ContentEvent nextEvent()` 
+is the main method for the entrance processor as it returns the next event to 
be processed by the topology. It should be called only if `isFinished()` 
returned `false` and `hasNext()` returned `true`.
+
+### Note
+All state variables of the class implementing this interface must be 
serializable. It can be done by implementing the `Serializable` interface. The 
simple way to skip this requirement is to declare those variables as 
`transient` and initialize them in the `onCreate()` method. Remember, all 
initializations of such transient variables done in the constructor will be 
lost.

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/SAMOA-Topology.md
----------------------------------------------------------------------
diff --git a/documentation/SAMOA-Topology.md b/documentation/SAMOA-Topology.md
new file mode 100644
index 0000000..6f83c03
--- /dev/null
+++ b/documentation/SAMOA-Topology.md
@@ -0,0 +1,18 @@
+---
+title: Apache SAMOA Topology
+layout: documentation
+documentation: true
+---
+Apache SAMOA allows users to write their stream processing algorithms in an 
easy and platform independent way. SAMOA defines its own topology which is very 
intuitive and simple to use. Currently SAMOA has the following basic topology 
elements.
+
+1. [Processor](Processor.html)
+1. [Content Event](Content-Event.html)
+1. [Stream](Stream.html)
+1. [Task](Task.html)
+1. [Topology Builder](Topology-Builder.html)
+1. [Learner](Learner.html)
+1. **Advanced topic**: [Processing Item](Processing-Item.html)
+
+Processor and Content Event are the logical units to build your algorithm, 
Stream and Task are the physical units to wire the various pieces of your 
algorithm, whereas Topology Builder is an administrative unit that provides 
bookkeeping services. Learner is the base interface for learning algorithms. 
Processing Items are internal wrappers for Processors used inside SAMOA.
+
+![Topology](images/Topology.png)

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/SAMOA-and-Machine-Learning.md
----------------------------------------------------------------------
diff --git a/documentation/SAMOA-and-Machine-Learning.md 
b/documentation/SAMOA-and-Machine-Learning.md
new file mode 100644
index 0000000..e61434d
--- /dev/null
+++ b/documentation/SAMOA-and-Machine-Learning.md
@@ -0,0 +1,13 @@
+---
+title: Apache SAMOA and Machine Learning
+layout: documentation
+documentation: true
+---
+SAMOA's main goal is to help developers to create easily machine learning 
algorithms on top of any distributed stream processing engine. Here we present 
the available machine learning algorithms implemented in SAMOA and how to use 
them. 
+
+* [2.1 Prequential Evaluation Task](Prequential-Evaluation-Task.html)
+* [2.2 Vertical Hoeffding Tree 
Classifier](Vertical-Hoeffding-Tree-Classifier.html)
+* [2.3 Adaptive Model Rules Regressor](Adaptive-Model-Rules-Regressor.html)
+* [2.4 Bagging and Boosting](Bagging-and-Boosting.html)
+* [2.5 Distributed Stream Clustering](Distributed-Stream-Clustering.html)
+* [2.6 Distributed Stream Frequent Itemset 
Mining](Distributed-Stream-Frequent-Itemset-Mining.html)

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/SAMOA-for-MOA-users.md
----------------------------------------------------------------------
diff --git a/documentation/SAMOA-for-MOA-users.md 
b/documentation/SAMOA-for-MOA-users.md
new file mode 100644
index 0000000..b091e2a
--- /dev/null
+++ b/documentation/SAMOA-for-MOA-users.md
@@ -0,0 +1,17 @@
+---
+title: Apache SAMOA for MOA users
+layout: documentation
+documentation: true
+---
+If you're an advanced user of [MOA](http://moa.cms.waikato.ac.nz/), you'll 
find easy to run SAMOA. You need to note the following:
+
+* There is no GUI interface in SAMOA
+* You can run SAMOA in the following modes:
+   1. Simulation Environment. Use `com.yahoo.labs.samoa.DoTask` instead of 
`moa.DoTask`   
+   2. Storm Local Mode. Use `com.yahoo.labs.samoa.LocalStormDoTask` instead of 
`moa.DoTask`
+   3. Storm Cluster Mode. You need to use the `samoa` script as it is 
explained in [Executing SAMOA with Apache Storm](Executing SAMOA with Apache 
Storm).
+   4. S4. You need to use the `samoa` script as it is explained in [Executing 
SAMOA with Apache S4](Executing SAMOA with Apache S4)
+
+To start with SAMOA, you can start with a simple example using the CoverType 
dataset as it is discussed in [Getting Started](Getting Started).  
+
+To use MOA algorithms inside SAMOA, take a look at 
[https://github.com/samoa-moa/samoa-moa](https://github.com/samoa-moa/samoa-moa).
 

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Scalable-Advanced-Massive-Online-Analysis.md
----------------------------------------------------------------------
diff --git a/documentation/Scalable-Advanced-Massive-Online-Analysis.md 
b/documentation/Scalable-Advanced-Massive-Online-Analysis.md
new file mode 100644
index 0000000..07b50a5
--- /dev/null
+++ b/documentation/Scalable-Advanced-Massive-Online-Analysis.md
@@ -0,0 +1,13 @@
+---
+title: Scalable Advanced Massive Online Analysis
+layout: documentation
+documentation: true
+---
+Scalable Advanced Massive Online Analysis (SAMOA) contains various algorithms 
for machine learning and data mining on data streams, and allows to run them on 
different distributed stream processing engines (DSPEs) such as Storm and S4. 
Currently, SAMOA contains methods for classification via Vertical Hoeffding 
Trees, bagging and boosting and clustering via CluStream.
+
+In this pages, we explain how to build and execute SAMOA for the different 
distributed stream processing engines (DSPEs): 
+
+* [Building SAMOA](Building-SAMOA.html)
+* [Executing SAMOA with Apache Storm](Executing-SAMOA-with-Apache-Storm.html)
+* [Executing SAMOA with Apache S4](Executing-SAMOA-with-Apache-S4.html)
+* [Executing SAMOA with Apache Samza](Executing-SAMOA-with-Apache-Samza.html)

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Stream.md
----------------------------------------------------------------------
diff --git a/documentation/Stream.md b/documentation/Stream.md
new file mode 100644
index 0000000..527a195
--- /dev/null
+++ b/documentation/Stream.md
@@ -0,0 +1,49 @@
+---
+title: Stream
+layout: documentation
+documentation: true
+---
+A stream is a physical unit of SAMOA topology which connects different 
Processors with each other. Stream is also created by a `TopologyBuilder` just 
like a Processor. A stream can have a single source but many destinations. A 
Processor which is the source of a stream, owns the stream.
+
+###1. Creating a Stream
+The following code snippet shows how a Stream is created:
+
+```
+builder.initTopology("MyTopology");
+Processor sourceProcessor = new Sampler();
+builder.addProcessor(samplerProcessor, 3);
+Stream sourceDataStream = builder.createStream(sourceProcessor);
+```
+
+###2. Connecting a Stream
+As described above, a Stream can have many destinations. In the following 
figure, a single stream from sourceProcessor is connected to three different 
destination Processors each having three instances.
+
+![SAMOA Message Shuffling](images/SAMOA Message Shuffling.png)
+
+SAMOA supports three different ways of distribution of messages to multiple 
instances of a Processor.
+####2.1 Shuffle
+In this way of message distribution, messages/events are distributed randomly 
among various instances of a Processor. 
+Following figure shows how the messages are distributed.
+![SAMOA Explain Shuffling](images/SAMOA Explain Shuffling.png)
+Following code snipped shows how to connect a stream to a destination using 
random shuffling.
+
+```
+builder.connectInputShuffleStream(sourceDataStream, destinationProcessor);
+```
+####2.2 Key
+In this way of message distribution, messages with same key are sent to same 
instance of a Processor.
+Following figure illustrates key-based distribution.
+![SAMOA Explain Key Shuffling](images/SAMOA Explain Key Shuffling.png)
+Following code snippet shows how to connect a stream to a destination using 
key-based distribution.
+
+```
+builder.connectInputKeyStream(sourceDataStream, destinationProcessor);
+```
+####2.3 All
+In this way of message distribution, all messages of a stream are sent to all 
instances of a destination Processor. Following figure illustrates this 
distribution process.
+![SAMOA Explain All Shuffling](images/SAMOA Explain All Shuffling.png)
+Following code snippet shows how to connect a stream to a destination using 
All-based distribution.
+
+```
+builder.connectInputAllStream(sourceDataStream, destinationProcessor);
+```

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Task.md
----------------------------------------------------------------------
diff --git a/documentation/Task.md b/documentation/Task.md
new file mode 100644
index 0000000..e583ce7
--- /dev/null
+++ b/documentation/Task.md
@@ -0,0 +1,54 @@
+---
+title: Task
+layout: documentation
+documentation: true
+---
+Task is similar to a job in Hadoop. Task is an execution entity. A topology 
must be defined inside a task. SAMOA can only execute classes that implement 
`Task` interface.
+
+###1. Implementation
+```
+package com.yahoo.labs.samoa.tasks;
+
+import com.yahoo.labs.samoa.topology.ComponentFactory;
+import com.yahoo.labs.samoa.topology.Topology;
+
+/**
+ * Task interface, the mother of all SAMOA tasks!
+ */
+public interface Task {
+
+       /**
+        * Initialize this SAMOA task, 
+        * i.e. create and connect Processors and Streams
+        * and initialize the topology
+        */
+       public void init();     
+       
+       /**
+        * Return the final topology object to be executed in the cluster
+        * @return topology object to be submitted to be executed in the cluster
+        */
+       public Topology getTopology();
+       
+       /**
+        * Sets the factory.
+        * TODO: propose to hide factory from task, 
+        * i.e. Task will only see TopologyBuilder, 
+        * and factory creation will be handled by TopologyBuilder
+        *
+        * @param factory the new factory
+        */
+       public void setFactory(ComponentFactory factory) ;
+}
+```
+
+###2. Methods
+#####2.1 `void init()`
+This method should build the desired topology by creating Processors and 
Streams and connecting them to each other.
+
+#####2.2 `Topology getTopology()`
+This method should return the topology built by `init` to the engine for 
execution.
+
+#####2.3 `void setFactory(ComponentFactory factory)`
+Utility method to accept a `ComponentFactory` to use in building the topology.
+

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Team.md
----------------------------------------------------------------------
diff --git a/documentation/Team.md b/documentation/Team.md
new file mode 100644
index 0000000..9e711e2
--- /dev/null
+++ b/documentation/Team.md
@@ -0,0 +1,41 @@
+---
+title: Apache SAMOA Team
+layout: documentation
+documentation: true
+---
+<h2>Project lead</h2>
+<ul>
+<li><a href="http://gdfm.me/";>Gianmarco De Francisci Morales</a></li>
+<li><a href="http://www.albertbifet.com";>Albert Bifet</a></li>
+</ul>
+
+<h2>Committers</h2>
+<ul>
+<li><a href="http://www.cse.usf.edu/~nkourtel/";>Nicolas Kourtellis</a></li>
+<li><a href="http://www.linkedin.com/pub/faisal-moeen/40/17/512";>Faisal 
Moeen</a></li>
+<li><a href="http://www.linkedin.com/in/mmorel";>Matthieu Morel</a></li>
+<li><a href="http://www.otnira.com";>Arinto Murdopo</a></li>
+<li><a href="http://cs.brown.edu/~matteo/";>Matteo Riondato</a></li>
+<li><a href="https://twitter.com/AntonioSeverien";>Antonio Severien</a></li>
+<li><a href="http://www.van-laere.net";>Olivier Van Laere</a></li>  
+<li><a href="http://www.linkedin.com/in/caseyvu";>Anh Thu Vu</a></li> 
+
+
+</ul>
+
+<h2>Contributors</h2>
+<ul>
+<li><a href="http://www.lsi.upc.edu/~marias/";>Marta Arias</a></li>
+<li><a href="http://www.lsi.upc.edu/~gavalda/";>Ricard Gavaldà</a></li>
+<li><a href="http://dme.rwth-aachen.de/de/team/hassani";>Marwan Hassani</a></li>
+<li><a 
href="http://www.scms.waikato.ac.nz/genquery.php?linklist=SCMS&amp;linktype=folder&amp;linkname=The_Dean-0";>Geoff
 Holmes</a></li>
+<li><a href="http://dme.rwth-aachen.de/de/team/jansen";>Timm Jansen</a></li>
+<li>Richard Kirkby</li>
+<li><a href="http://dme.rwth-aachen.de/de/team/kranen";>Philipp Kranen</a></li>
+<li><a href="http://dme.rwth-aachen.de/de/team/kremer";>Hardy Kremer</a></li>
+<li><a href="http://www.cs.waikato.ac.nz/~bernhard";>Bernhard 
Pfahringer</a></li>
+<li><a href="http://users.ics.aalto.fi/jesse/";>Jesse Read</a></li>
+<li><a href="http://www.cs.waikato.ac.nz/~fracpete";>Peter Reutemann</a></li>
+<li><a href="http://dme.rwth-aachen.de/de/team/seidl";>Thomas Seidl</a></li>
+
+</ul>

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Topology-Builder.md
----------------------------------------------------------------------
diff --git a/documentation/Topology-Builder.md 
b/documentation/Topology-Builder.md
new file mode 100644
index 0000000..fe07120
--- /dev/null
+++ b/documentation/Topology-Builder.md
@@ -0,0 +1,33 @@
+---
+title: Topology Builder
+layout: documentation
+documentation: true
+---
+`TopologyBuilder` is a builder class which builds physical units of the 
topology and assemble them together. Each topology has a name. Following code 
snippet shows all the steps of creating a topology with one `EntrancePI`, two 
PIs and a few streams.
+
+```
+TopologyBuilder builder = new TopologyBuilder(factory) // ComponentFactory 
factory
+builder.initTopology("Parma Topology"); //initiates an empty topology with a 
name
+//********************************Topology 
building***********************************
+StreamSource sourceProcessor = new 
StreamSource(inputPath,d,sampleSize,fpmGap,epsilon,phi,numSamples);
+builder.addEntranceProcessor(sourceProcessor);
+Stream sourceDataStream = builder.createStream(sourceProcessor);
+sourceProcessor.setDataStream(sourceDataStream);
+Stream sourceControlStream = builder.createStream(sourceProcessor);
+sourceProcessor.setControlStream(sourceControlStream);
+
+Sampler sampler = new 
Sampler(minFreqPercent,sampleSize,(float)epsilon,outputPath,sampler);
+builder.addProcessor(sampler, numSamples);
+builder.connectInputAllStream(sourceControlStream, sampler);
+builder.connectInputShuffleStream(sourceDataStream, sampler);
+
+Stream samplerDataStream = builder.createStream(sampler);
+samplerP.setSamplerDataStream(samplerDataStream);
+Stream samplerControlStream = builder.createStream(sampler);
+samplerP.setSamplerControlStream(samplerControlStream);
+
+Aggregator aggregatorProcessor = new 
Aggregator(outputPath,(long)numSamples,(long)sampleSize,(long)reqApproxNum,(float)epsilon);
+builder.addProcessor(aggregatorProcessor, numAggregators);
+builder.connectInputKeyStream(samplerDataStream, aggregatorProcessor);
+builder.connectInputAllStream(samplerControlStream, aggregatorProcessor);
+```

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Vertical-Hoeffding-Tree-Classifier.md
----------------------------------------------------------------------
diff --git a/documentation/Vertical-Hoeffding-Tree-Classifier.md 
b/documentation/Vertical-Hoeffding-Tree-Classifier.md
new file mode 100644
index 0000000..a3c027d
--- /dev/null
+++ b/documentation/Vertical-Hoeffding-Tree-Classifier.md
@@ -0,0 +1,26 @@
+---
+title: Vertical Hoeffding Tree
+layout: documentation
+documentation: true
+---
+Vertical Hoeffding Tree (VHT) classifier is a distributed classifier that 
utilizes vertical parallelism on top of the Very Fast Decision Tree (VFDT) or 
Hoeffding Tree classifier.
+
+### Very Fast Decision Tree (VFDT) classifier
+[Hoeffding Tree or VFDT](http://doi.acm.org/10.1145/347090.347107) is the 
standard decision tree algorithm for data stream classification. VFDT uses the 
Hoeffding bound to decide the minimum number of arriving instances to achieve 
certain level of confidence in splitting the node. This confidence level 
determines how close the statistics between the attribute chosen by VFDT and 
the attribute chosen by decision tree for batch learning.
+
+For a more comprehensive summary of VFDT, read chapter 3 of [Data Stream 
Mining: A Practical 
Approach](http://heanet.dl.sourceforge.net/project/moa-datastream/documentation/StreamMining.pdf).
+
+### Vertical Parallelism 
+Vertical Parallelism is a parallelism approach which partitions the instances 
in term of attribute for parallel processing. Vertical-parallelism-based 
decision tree induction processes the partitioned instances (which consists of 
subset of attribute) to calculate the information-theoretic criteria in 
parallel. For example, if we have instances with 100 attributes and we 
partition the instances into 5 portions, we will have 20 attributes per 
portion. The algorithm processes the 20 attributes in parallel to determine the 
"local" best attribute to split and combine the parallel computation results to 
determine the "global" best attribute to split and grow the tree. 
+
+For more explanation about available parallelism types for decision tree 
induction, you can read chapter 4 of [Distributed Decision Tree Learning for 
Mining Big Data Streams](../SAMOA-Developers-Guide-0-0-1.pdf), the Developer's 
Guide of SAMOA.  
+
+### Vertical Hoeffding Tree (VHT) classifier
+VHT is implemented using the SAMOA API. The diagram below shows the 
implementation:
+![Vertical Hoeffding Tree](images/VHT.png)
+
+The _source Processor_ and the _evaluator Processor_ are components of the 
[prequential evaluation task](Prequential-Evaluation-Task) in SAMOA. The 
_model-aggregator Processor_ contains  the decision tree model. It connects to 
_local-statistic Processor_ via _attribute_ stream and _control_ stream. The 
_model-aggregator Processor_ splits instances based on attribute and each 
_local-statistic Processor_ contains local statistic for attributes that 
assigned to it. The _model-aggregator Processor_ sends the split instances via 
attribute stream and it sends control messages to ask _local-statistic 
Processor_ to perform computation via _control_ stream. Users configure _n_, 
which is the parallelism level of the algorithm. The parallelism level is 
translated into the number of local-statistic Processors in the algorithm.
+
+The _model-aggregator Processor_ sends the classification result via _result_ 
stream to the _evaluator Processor_ for the corresponding evaluation task or 
other destination Processor. The _evaluator Processor_ performs an evaluation 
of the algorithm showing accuracy and throughput. Incoming instances to the 
_model-aggregator Processor_ arrive via _source_ stream. The calculation 
results from local statistic arrive to the _model-aggregator Processor_ via 
_computation-result_ stream.
+
+For more details about the algorithms (i.e. pseudocode), go to section 4.2 of 
[Distributed Decision Tree Learning for Mining Big Data 
Streams](../SAMOA-Developers-Guide-0-0-1.pdf), the Developer's Guide of SAMOA.  

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/HelloWorldTask.png
----------------------------------------------------------------------
diff --git a/documentation/images/HelloWorldTask.png 
b/documentation/images/HelloWorldTask.png
new file mode 100644
index 0000000..0166aa4
Binary files /dev/null and b/documentation/images/HelloWorldTask.png differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/PrequentialEvaluation.png
----------------------------------------------------------------------
diff --git a/documentation/images/PrequentialEvaluation.png 
b/documentation/images/PrequentialEvaluation.png
new file mode 100644
index 0000000..c0c742c
Binary files /dev/null and b/documentation/images/PrequentialEvaluation.png 
differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA
 Explain All Shuffling.png
----------------------------------------------------------------------
diff --git a/documentation/images/SAMOA Explain All Shuffling.png 
b/documentation/images/SAMOA Explain All Shuffling.png
new file mode 100644
index 0000000..3c0e044
Binary files /dev/null and b/documentation/images/SAMOA Explain All 
Shuffling.png differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA
 Explain Key Shuffling.png
----------------------------------------------------------------------
diff --git a/documentation/images/SAMOA Explain Key Shuffling.png 
b/documentation/images/SAMOA Explain Key Shuffling.png
new file mode 100644
index 0000000..4fbc2f9
Binary files /dev/null and b/documentation/images/SAMOA Explain Key 
Shuffling.png differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA
 Explain Shuffling.png
----------------------------------------------------------------------
diff --git a/documentation/images/SAMOA Explain Shuffling.png 
b/documentation/images/SAMOA Explain Shuffling.png
new file mode 100644
index 0000000..8427bce
Binary files /dev/null and b/documentation/images/SAMOA Explain Shuffling.png 
differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA
 FIM.jpg
----------------------------------------------------------------------
diff --git a/documentation/images/SAMOA FIM.jpg b/documentation/images/SAMOA 
FIM.jpg
new file mode 100644
index 0000000..8724910
Binary files /dev/null and b/documentation/images/SAMOA FIM.jpg differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA
 FIM.png
----------------------------------------------------------------------
diff --git a/documentation/images/SAMOA FIM.png b/documentation/images/SAMOA 
FIM.png
new file mode 100644
index 0000000..4c14d2f
Binary files /dev/null and b/documentation/images/SAMOA FIM.png differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA
 Message Shuffling.png
----------------------------------------------------------------------
diff --git a/documentation/images/SAMOA Message Shuffling.png 
b/documentation/images/SAMOA Message Shuffling.png
new file mode 100644
index 0000000..bb71402
Binary files /dev/null and b/documentation/images/SAMOA Message Shuffling.png 
differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/Topology.png
----------------------------------------------------------------------
diff --git a/documentation/images/Topology.png 
b/documentation/images/Topology.png
new file mode 100644
index 0000000..11571ff
Binary files /dev/null and b/documentation/images/Topology.png differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/VHT.png
----------------------------------------------------------------------
diff --git a/documentation/images/VHT.png b/documentation/images/VHT.png
new file mode 100644
index 0000000..3241761
Binary files /dev/null and b/documentation/images/VHT.png differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/hamr.png
----------------------------------------------------------------------
diff --git a/documentation/images/hamr.png b/documentation/images/hamr.png
new file mode 100644
index 0000000..c79ca0d
Binary files /dev/null and b/documentation/images/hamr.png differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/vamr.png
----------------------------------------------------------------------
diff --git a/documentation/images/vamr.png b/documentation/images/vamr.png
new file mode 100644
index 0000000..53c6d58
Binary files /dev/null and b/documentation/images/vamr.png differ

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/index.html
----------------------------------------------------------------------
diff --git a/index.html b/index.html
index 9adf1aa..bc3d6d2 100644
--- a/index.html
+++ b/index.html
@@ -16,16 +16,18 @@
     <h2>Scalable Advanced Massive Online Analysis</h2>    
     
     <h3>Apache SAMOA is currently undergoing Incubation at the Apache Software 
Foundation.
-    </br>New <a href="https://github.com/yahoo/samoa/releases";> Release 
0.2.0</a> !
-    </br>View on <a href="https://github.com/yahoo/samoa";>GitHub 
<small>yahoo/samoa</small></a>
+    </br>Last release  before entering incubation/not at Apache: <a 
href="https://github.com/yahoo/samoa/releases";> Release 0.2.0</a> !
+    </br>View on <a href=" https://github.com/apache/incubator-samoa";>GitHub 
<small> Apache SAMOA</small></a>
     </h3>
     <div id="slideshow">
-      <img src="images/slideshow/runtime.png" alt="SAMOA" class="slide active" 
height= "300" />
+      <img src="images/slideshow/runtime.png" alt="Apache SAMOA" class="slide 
active" height= "300" />
     </div>
 </section>
 
 <section id="tutorial" class="next-steps">
-  <h1>Apache SAMOA is distributed streaming machine learning (ML) framework 
that contains a 
+  <h1><a href="http://incubator.apache.org/";><img  
style="max-width:55%;border:0px solid 
black;"src="http://incubator.apache.org/images/egg-logo.png"; alt="Apache 
Incubator" > </a>
+
+Apache SAMOA is a distributed streaming machine learning (ML) framework that 
contains a 
 programing abstraction for distributed streaming ML algorithms.</h1>
 
 <h2>Apache SAMOA enables development of new ML algorithms without dealing with 
@@ -50,10 +52,10 @@ in multiple SPEs, i.e., code the algorithms once and 
execute them in multiple SP
        <p>Hands-on with Apache SAMOA: Getting Started in 5 minutes!</p>
     </li>
     <li class="guides">
-      <a class="hero-octicon" href="https://github.com/yahoo/samoa/wiki/";>
+      <a class="hero-octicon" href="documentation/Home.html">
         <span class="mega-octicon octicon-book"></span>
       </a>
-      <h4><a href="https://github.com/yahoo/samoa/wiki/";>Documentation</a></h4>
+      <h4><a href="documentation/Home.html">Documentation</a></h4>
        <p>Learn how to use Apache SAMOA in the various different ways 
possible. </p>
 
     </li>
@@ -62,7 +64,7 @@ in multiple SPEs, i.e., code the algorithms once and execute 
them in multiple SP
 
 <section class="tutorial">
 <h1>Slides</h1>
-<h2><a 
href="https://speakerdeck.com/gdfm/samoa-a-platform-for-mining-big-data-streams-2";><img
 src="samoa-slides.jpg" alt="SAMOA Slides" style="max-width:100%;"></a></h2>
+<h2><a 
href="https://speakerdeck.com/gdfm/samoa-a-platform-for-mining-big-data-streams-2";><img
 src="samoa-slides.jpg" alt="SAMOA Slides" 
data-canonical-src="samoa-slides.jpg" style="max-width:100%;"></a></h2>
 <h2>G. De Francisci Morales <a 
href="http://melmeric.files.wordpress.com/2013/04/samoa-a-platform-for-mining-big-data-streams.pdf";>SAMOA:
 A Platform for Mining Big Data Streams</a>
 Keynote Talk at <a href="http://www.ramss.ws/2013/program/";>RAMSS '13</a>: 2nd 
International Workshop on Real-Time Analysis and Mining of Social Streams WWW, 
Rio De Janeiro, 2013.</h2>
  </section>
@@ -73,7 +75,7 @@ Keynote Talk at <a 
href="http://www.ramss.ws/2013/program/";>RAMSS '13</a>: 2nd I
 
 <h1>Apache SAMOA Developer's Guide</h1>
 
-<h2><a href="SAMOA-Developers-Guide-0-0-1.pdf"><img 
style="max-width:95%;border:3px solid black;" src="Manual.png" alt="SAMOA 
Developer's guide" height="250"> </a></h2>
+<h2><a href="SAMOA-Developers-Guide-0-3-0.pdf"><img 
style="max-width:95%;border:3px solid black;" src="Manual.png" alt="SAMOA 
Developer's guide" height="250"> </a></h2>
    </section><section class="tutorial">
    
 <h1>API Javadoc Reference</h1>
@@ -88,7 +90,7 @@ Keynote Talk at <a 
href="http://www.ramss.ws/2013/program/";>RAMSS '13</a>: 2nd I
 <a 
href="mailto:[email protected]";>[email protected]</a></h2>
 
 <h1>Contributors</h1>
-<h2><a href="contributors.html">List of contributors to the SAMOA 
project</a>.</h2>
+<h2><a href="documentation/Team.html">List of contributors to the SAMOA 
project</a>.</h2>
    </section><section class="next-steps">
 <h1>License</h1>
 
@@ -111,27 +113,27 @@ Apache License, Version 2.0 (<a 
href="http://www.apache.org/licenses/LICENSE-2.0
       <div class="terminal">
         <div class="header"></div>
         <div class="shell">
-          <p><span class="path">~</span><span class="prompt">$</span>git clone 
[email protected]:yahoo/samoa.git</p>
-<p><span class="path">~</span><span class="prompt">$</span>cd samoa</p>
+          <p><span class="path">~</span><span class="prompt">$</span>git clone 
http://git.apache.org/incubator-samoa.git</p>
+<p><span class="path">~</span><span class="prompt">$</span>cd 
incubator-samoa</p>
 <p><span class="path">~</span><span class="prompt">$</span>mvn -Pstorm 
package</p>
         </div>
       </div>
-<p>The deployable jar for Apache SAMOA will be in 
<code>target/SAMOA-Storm-0.0.1-SNAPSHOT.jar</code>.</p>
+<p>The deployable jar for Apache SAMOA will be in 
<code>target/SAMOA-Storm-0.3.0-SNAPSHOT.jar</code>.</p>
     </li>
 
   <li id="terminal-step-1" class="option-terminal">
       <h4>Apache S4</h4>
       <p>If you want to compile Apache SAMOA for S4, you will need to install 
the S4 dependencies
-manually as explained in <a 
href="https://github.com/yahoo/samoa/wiki/Executing-SAMOA-with-Apache-S4";>Executing
 Apache SAMOA with Apache S4</a>.</p>
+manually as explained in <a 
href="documentation/Executing-SAMOA-with-Apache-S4.html">Executing Apache SAMOA 
with Apache S4</a>.</p>
 <div class="terminal">
         <div class="header"></div>
         <div class="shell">
-          <p><span class="path">~</span><span class="prompt">$</span>git clone 
[email protected]:yahoo/samoa.git</p>
-<p><span class="path">~</span><span class="prompt">$</span>cd samoa</p>
+          <p><span class="path">~</span><span class="prompt">$</span>git clone 
http://git.apache.org/incubator-samoa.git</p>
+<p><span class="path">~</span><span class="prompt">$</span>cd 
incubator-samoa</p>
 <p><span class="path">~</span><span class="prompt">$</span>mvn -Ps4 package</p>
         </div>
       </div>
-<p>The deployable jar for Apache SAMOA will be in 
<code>target/SAMOA-S4-0.0.1-SNAPSHOT.jar</code>.</p>
+<p>The deployable jar for Apache SAMOA will be in 
<code>target/SAMOA-S4-0.3.0-SNAPSHOT.jar</code>.</p>
     </li>
 
   <li id="terminal-step-1" class="option-terminal">
@@ -140,12 +142,12 @@ manually as explained in <a 
href="https://github.com/yahoo/samoa/wiki/Executing-
 <div class="terminal">
         <div class="header"></div>
         <div class="shell">
-          <p><span class="path">~</span><span class="prompt">$</span>git clone 
[email protected]:yahoo/samoa.git</p>
-<p><span class="path">~</span><span class="prompt">$</span>cd samoa</p>
+          <p><span class="path">~</span><span class="prompt">$</span>git clone 
http://git.apache.org/incubator-samoa.git</p>
+<p><span class="path">~</span><span class="prompt">$</span>cd 
incubator-samoa</p>
 <p><span class="path">~</span><span class="prompt">$</span>mvn package</p>
         </div>
       </div>
-<p>The deployable jar for Apache SAMOA will be in 
<code>target/SAMOA-Local-0.0.1-SNAPSHOT.jar</code>.</p>
+<p>The deployable jar for Apache SAMOA will be in 
<code>target/SAMOA-Local-0.3.0-SNAPSHOT.jar</code>.</p>
     </li>
 
   </ul>
@@ -164,8 +166,8 @@ manually as explained in <a 
href="https://github.com/yahoo/samoa/wiki/Executing-
       <div class="terminal">
         <div class="header"></div>
         <div class="shell">
-          <p><span class="path">~</span><span class="prompt">$</span>git clone 
[email protected]:yahoo/samoa.git</p>
-<p><span class="path">~</span><span class="prompt">$</span>cd samoa</p>
+          <p><span class="path">~</span><span class="prompt">$</span>git clone 
http://git.apache.org/incubator-samoa.git</p>
+<p><span class="path">~</span><span class="prompt">$</span>cd 
incubator-samoa</p>
 <p><span class="path">~</span><span class="prompt">$</span>mvn package</p>
         </div>
       </div>
@@ -174,7 +176,7 @@ manually as explained in <a 
href="https://github.com/yahoo/samoa/wiki/Executing-
 <li id="terminal-step-1" class="option-terminal">
       <h4>Download the Forest CoverType dataset </h4>
       <p>If you want to compile Apache SAMOA for S4, you will need to install 
the S4 dependencies
-manually as explained in <a 
href="https://github.com/yahoo/samoa/wiki/Executing-SAMOA-with-Apache-S4";>Executing
 Apache SAMOA with Apache S4</a>.</p>
+manually as explained in <a 
href="documentation/Executing-SAMOA-with-Apache-S4">Executing Apache SAMOA with 
Apache S4</a>.</p>
 <div class="terminal">
         <div class="header"></div>
         <div class="shell">
@@ -191,7 +193,7 @@ manually as explained in <a 
href="https://github.com/yahoo/samoa/wiki/Executing-
 <div class="terminal">
         <div class="header"></div>
         <div class="shell">
-          <p><span class="path">~</span><span class="prompt">$</span>bin/samoa 
local target/SAMOA-Local-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -l 
classifiers.ensemble.Bagging
+          <p><span class="path">~</span><span class="prompt">$</span>bin/samoa 
local target/SAMOA-Local-0.3.0-SNAPSHOT.jar "PrequentialEvaluation -l 
classifiers.ensemble.Bagging
     -s (ArffFileStream -f covtypeNorm.arff) -f 100000"</p>
         </div>
       </div>
@@ -203,7 +205,9 @@ manually as explained in <a 
href="https://github.com/yahoo/samoa/wiki/Executing-
 <section class="tutorial">
 <h2><a href="http://incubator.apache.org/";><img  
style="max-width:95%;border:0px solid 
black;"src="http://incubator.apache.org/images/egg-logo.png"; alt="Apache 
Incubator" > </a></h2>
 <h2>
-Apache SAMOA is an effort undergoing incubation at The Apache Software 
Foundation (ASF), sponsored by the name of Apache TLP sponsor. Incubation is 
required of all newly accepted projects until a further review indicates that 
the infrastructure, communications, and decision making process have stabilized 
in a manner consistent with other successful ASF projects. While incubation 
status is not necessarily a reflection of the completeness or stability of the 
code, it does indicate that the project has yet to be fully endorsed by the 
ASF.</h2>
+Apache SAMOA is an effort undergoing incubation at The Apache Software 
Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of 
all newly accepted projects until a further review indicates that the 
infrastructure, communications, and decision making process have stabilized in 
a manner consistent with other successful ASF projects. While incubation status 
is not necessarily a reflection of the completeness or stability of the code, 
it does indicate that the project has yet to be fully endorsed by the ASF.</h2>
+
+<h2>Apache and the Apache feather logo are trademarks of The Apache Software 
Foundation.</h2>
 </section>
 
 <script src="js/jquery.js"></script>
@@ -217,9 +221,8 @@ Apache SAMOA is an effort undergoing incubation at The 
Apache Software Foundatio
 <footer class="page-footer">
 
    <ul class="site-footer-links right">
-          <li><a href="https://github.com/yahoo/samoa/zipball/master";>Download 
<strong>ZIP File</strong></a></li>
-          <li><a href="https://github.com/yahoo/samoa/tarball/master";>Download 
<strong>TAR Ball</strong></a></li>
-          <li><a href="https://github.com/yahoo/samoa";>View On 
<strong>GitHub</strong></a></li>
+
+          <li><a href="https://github.com/apache/incubator-samoa";>View On 
<strong>GitHub</strong></a></li>
   </ul>
 
   <a href="/">
@@ -229,8 +232,8 @@ Apache SAMOA is an effort undergoing incubation at The 
Apache Software Foundatio
   <ul class="site-footer-links">
     <li>&copy; 2014 <span>Apache SAMOA</span></li>
      <li><a href="#build">Build Apache SAMOA</a></h4>
-     <li><a 
href="https://github.com/yahoo/samoa/wiki/Getting%20Started";>Getting 
started!</a></li>
-     <li><a href="https://github.com/yahoo/samoa/wiki/";>Documentation</a></li>
+     <li><a href="documentation/Getting-Started.html">Getting started!</a></li>
+     <li><a href="documentation/Home.html">Documentation</a></li>
   </ul>
 </footer>
 

http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/params.json
----------------------------------------------------------------------
diff --git a/params.json b/params.json
index 78ad0dc..0dbcbc5 100644
--- a/params.json
+++ b/params.json
@@ -1 +1 @@
-{"name":"SAMOA","tagline":"Scalable Advanced Massive Online 
Analysis","body":"SAMOA is a platform for mining on big data streams.\r\nIt is 
a distributed streaming machine learning (ML) framework that contains a 
\r\nprograming abstraction for distributed streaming ML 
algorithms.\r\n\r\nSAMOA enables development of new ML algorithms without 
dealing with \r\nthe complexity of underlying streaming processing engines 
(SPE, such \r\nas Apache Storm and Apache S4). SAMOA also provides 
extensibility in integrating\r\nnew SPEs into the framework. These features 
allow SAMOA users to develop \r\ndistributed streaming ML algorithms once and 
to execute the algorithms \r\nin multiple SPEs, i.e., code the algorithms once 
and execute them in multiple SPEs.\r\n\r\n## 
Build\r\n\r\n###Storm\r\n\r\nSimply clone the repository and install 
SAMOA.\r\n```bash\r\ngit clone [email protected]:yahoo/samoa.git\r\ncd 
samoa\r\nmvn -Pstorm package\r\n```\r\n\r\nThe deployable jar for SAMOA will be 
in `target/SAMOA-St
 orm-0.0.1.jar`.\r\n\r\n###S4\r\n\r\nIf you want to compile SAMOA for S4, you 
will need to install the S4 dependencies\r\nmanually as explained in [Executing 
SAMOA with Apache 
S4](https://github.com/yahoo/samoa/wiki/Executing-SAMOA-with-Apache-S4).\r\n\r\nOnce
 the dependencies if needed are installed, you can simply clone the repository 
and install SAMOA.\r\n\r\n```bash\r\ngit clone 
[email protected]:yahoo/samoa.git\r\ncd samoa\r\nmvn -Ps4 
package\r\n```\r\n\r\nThe deployable jars for SAMOA will be in 
`target/SAMOA-S4-0.0.1.jar`.\r\n\r\n## Documentation\r\n\r\nThe documentation 
is intended to give an introduction on how to use SAMOA in the various 
different ways possible. \r\nAs a user you can use it to develop new algorithms 
and test different Stream Processing Engines.\r\n\r\n* [1 Scalable Advanced 
Massive Online Analysis](https://github.com/yahoo/samoa/wiki/Scalable Advanced 
Massive Online Analysis)\r\n    * [1.0 Building 
SAMOA](https://github.com/yahoo/samoa/wiki/Building SAMOA)\r\n
     * [1.1 Executing SAMOA with Apache 
Storm](https://github.com/yahoo/samoa/wiki/Executing SAMOA with Apache 
Storm)\r\n    * [1.2 Executing SAMOA with Apache 
S4](https://github.com/yahoo/samoa/wiki/Executing-SAMOA-with-Apache-S4)\r\n* [2 
SAMOA and Machine Learning](https://github.com/yahoo/samoa/wiki/SAMOA and 
Machine Learning)\r\n    * [2.1 Prequential Evaluation 
Task](https://github.com/yahoo/samoa/wiki/Prequential Evaluation Task)\r\n    * 
[2.2 Vertical Hoeffding Tree 
Classifier](https://github.com/yahoo/samoa/wiki/Vertical Hoeffding Tree 
Classifier)\r\n    * [2.3 Distributed Stream 
Clustering](https://github.com/yahoo/samoa/wiki/Distributed Stream 
Clustering)\r\n* [3 SAMOA Topology](https://github.com/yahoo/samoa/wiki/SAMOA 
Topology)\r\n    * [3.1 
Processor](https://github.com/yahoo/samoa/wiki/Processor)\r\n    * [3.2 Content 
Event](https://github.com/yahoo/samoa/wiki/Content Event)\r\n    * [3.3 
Stream](https://github.com/yahoo/samoa/wiki/Stream)\r\n    * [3.4 
Task](https://gi
 thub.com/yahoo/samoa/wiki/Task)\r\n    * [3.5 Topology 
Builder](https://github.com/yahoo/samoa/wiki/Topology Builder)\r\n    * [3.6 
Topology Starter](https://github.com/yahoo/samoa/wiki/Topology Starter)\r\n    
* [3.7 Learner](https://github.com/yahoo/samoa/wiki/Learner)\r\n    * [3.8 
Processing Item](https://github.com/yahoo/samoa/wiki/Processing Item)\r\n* [4 
Developing New Tasks in SAMOA](https://github.com/yahoo/samoa/wiki/Developing 
New Tasks in SAMOA)\r\n\r\n## Slides\r\n\r\nG. De Francisci Morales [SAMOA: A 
Platform for Mining Big Data 
Streams](http://melmeric.files.wordpress.com/2013/04/samoa-a-platform-for-mining-big-data-streams.pdf)\r\nKeynote
 Talk at [RAMSS ’13](http://www.ramss.ws/2013/program/): 2nd International 
Workshop on Real-Time Analysis and Mining of Social Streams WWW, Rio De 
Janeiro, 2013.\r\n\r\n<script async class=\"speakerdeck-embed\" 
data-id=\"fee15d509f0a0130a1252e07bed0c63d\" data-ratio=\"1.33333333333333\" 
src=\"//speakerdeck.com/assets/embed.js\"></s
 cript>\r\n\r\n## License\r\n\r\nThe use and distribution terms for this 
software are covered by the\r\nApache License, Version 2.0 
(http://www.apache.org/licenses/LICENSE-2.0.html).\r\n","google":"","note":"Don't
 delete this file! It's used internally to help with page regeneration."}
\ No newline at end of file
+{"name":"SAMOA","tagline":"Scalable Advanced Massive Online 
Analysis","body":"SAMOA is a platform for mining on big data streams.\r\nIt is 
a distributed streaming machine learning (ML) framework that contains a 
\r\nprograming abstraction for distributed streaming ML 
algorithms.\r\n\r\nSAMOA enables development of new ML algorithms without 
dealing with \r\nthe complexity of underlying streaming processing engines 
(SPE, such \r\nas Apache Storm and Apache S4). SAMOA also provides 
extensibility in integrating\r\nnew SPEs into the framework. These features 
allow SAMOA users to develop \r\ndistributed streaming ML algorithms once and 
to execute the algorithms \r\nin multiple SPEs, i.e., code the algorithms once 
and execute them in multiple SPEs.\r\n\r\n## 
Build\r\n\r\n###Storm\r\n\r\nSimply clone the repository and install 
SAMOA.\r\n```bash\r\ngit clone http://git.apache.org/incubator-samoa.git\r\ncd 
samoa\r\nmvn -Pstorm package\r\n```\r\n\r\nThe deployable jar for SAMOA will be 
in `targ
 et/SAMOA-Storm-0.0.1.jar`.\r\n\r\n###S4\r\n\r\nIf you want to compile SAMOA 
for S4, you will need to install the S4 dependencies\r\nmanually as explained 
in [Executing SAMOA with Apache 
S4](documentation/Executing-SAMOA-with-Apache-S4).\r\n\r\nOnce the dependencies 
if needed are installed, you can simply clone the repository and install 
SAMOA.\r\n\r\n```bash\r\ngit clone 
http://git.apache.org/incubator-samoa.git\r\ncd samoa\r\nmvn -Ps4 
package\r\n```\r\n\r\nThe deployable jars for SAMOA will be in 
`target/SAMOA-S4-0.0.1.jar`.\r\n\r\n## Documentation\r\n\r\nThe documentation 
is intended to give an introduction on how to use SAMOA in the various 
different ways possible. \r\nAs a user you can use it to develop new algorithms 
and test different Stream Processing Engines.\r\n\r\n* [1 Scalable Advanced 
Massive Online Analysis](documentation/Scalable Advanced Massive Online 
Analysis)\r\n    * [1.0 Building SAMOA](documentation/Building SAMOA)\r\n    * 
[1.1 Executing SAMOA with Apache Storm
 ](documentation/Executing SAMOA with Apache Storm)\r\n    * [1.2 Executing 
SAMOA with Apache S4](documentation/Executing-SAMOA-with-Apache-S4)\r\n* [2 
SAMOA and Machine Learning](documentation/SAMOA and Machine Learning)\r\n    * 
[2.1 Prequential Evaluation Task](documentation/Prequential Evaluation 
Task)\r\n    * [2.2 Vertical Hoeffding Tree Classifier](documentation/Vertical 
Hoeffding Tree Classifier)\r\n    * [2.3 Distributed Stream 
Clustering](documentation/Distributed Stream Clustering)\r\n* [3 SAMOA 
Topology](documentation/SAMOA Topology)\r\n    * [3.1 
Processor](documentation/Processor)\r\n    * [3.2 Content 
Event](documentation/Content Event)\r\n    * [3.3 
Stream](documentation/Stream)\r\n    * [3.4 Task](documentation/Task)\r\n    * 
[3.5 Topology Builder](documentation/Topology Builder)\r\n    * [3.6 Topology 
Starter](documentation/Topology Starter)\r\n    * [3.7 
Learner](documentation/Learner)\r\n    * [3.8 Processing 
Item](documentation/Processing Item)\r\n* [4 Developing
  New Tasks in SAMOA](documentation/Developing New Tasks in SAMOA)\r\n\r\n## 
Slides\r\n\r\nG. De Francisci Morales [SAMOA: A Platform for Mining Big Data 
Streams](http://melmeric.files.wordpress.com/2013/04/samoa-a-platform-for-mining-big-data-streams.pdf)\r\nKeynote
 Talk at [RAMSS ’13](http://www.ramss.ws/2013/program/): 2nd International 
Workshop on Real-Time Analysis and Mining of Social Streams WWW, Rio De 
Janeiro, 2013.\r\n\r\n<script async class=\"speakerdeck-embed\" 
data-id=\"fee15d509f0a0130a1252e07bed0c63d\" data-ratio=\"1.33333333333333\" 
src=\"//speakerdeck.com/assets/embed.js\"></script>\r\n\r\n## 
License\r\n\r\nThe use and distribution terms for this software are covered by 
the\r\nApache License, Version 2.0 
(http://www.apache.org/licenses/LICENSE-2.0.html).\r\n","google":"","note":"Don't
 delete this file! It's used internally to help with page regeneration."}

Reply via email to