svn commit: r1762231 [2/3] - in /incubator/samoa/site: ./ documentation/

nkourtellis Sun, 25 Sep 2016 13:40:33 -0700

Modified: 
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html 
(original)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html 
Sun Sep 25 20:39:59 2016
@@ -77,313 +77,221 @@
 The steps included in this tutorial are:</p>
 
 <ol>
-  <li>
-    <p>Setup and configure a cluster with the required dependencies. This 
applies for single-node (local) execution as well.</p>
-  </li>
-  <li>
-    <p>Build SAMOA deployables</p>
-  </li>
-  <li>
-    <p>Configure SAMOA-Samza</p>
-  </li>
-  <li>
-    <p>Deploy SAMOA-Samza and execute a task</p>
-  </li>
-  <li>
-    <p>Observe the execution and the result</p>
-  </li>
+<li><p>Setup and configure a cluster with the required dependencies. This 
applies for single-node (local) execution as well.</p></li>
+<li><p>Build SAMOA deployables</p></li>
+<li><p>Configure SAMOA-Samza</p></li>
+<li><p>Deploy SAMOA-Samza and execute a task</p></li>
+<li><p>Observe the execution and the result</p></li>
 </ol>
 
 <h2 id="setup-cluster">Setup cluster</h2>
+
 <p>The following are needed to to run SAMOA on top of Samza:</p>
 
 <ul>
-  <li><a href="http://zookeeper.apache.org/";>Apache Zookeeper</a></li>
-  <li><a href="http://kafka.apache.org/";>Apache Kafka</a></li>
-  <li><a 
href="http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html";>Apache
 Hadoop YARN and HDFS</a></li>
+<li><a href="http://zookeeper.apache.org/";>Apache Zookeeper</a></li>
+<li><a href="http://kafka.apache.org/";>Apache Kafka</a></li>
+<li><a 
href="http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html";>Apache
 Hadoop YARN and HDFS</a></li>
 </ul>
 
 <h3 id="zookeeper">Zookeeper</h3>
-<p>Zookeeper is used by Kafka to coordinate its brokers. The detail 
instructions to setup a Zookeeper cluster can be found <a 
href="http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html";>here</a>.</p>
+
+<p>Zookeeper is used by Kafka to coordinate its brokers. The detail 
instructions to setup a Zookeeper cluster can be found <a 
href="http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html";>here</a>. 
</p>
 
 <p>To quickly setup a single-node Zookeeper cluster:</p>
 
 <ol>
-  <li>
-    <p>Download the binary release from the <a 
href="http://zookeeper.apache.org/releases.html";>release page</a>.</p>
-  </li>
-  <li>
-    <p>Untar the archive</p>
-  </li>
+<li><p>Download the binary release from the <a 
href="http://zookeeper.apache.org/releases.html";>release page</a>.</p></li>
+<li><p>Untar the archive</p></li>
 </ol>
-
-<p><code class="highlighter-rouge">
-tar -xf $DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/
-</code></p>
-
+<div class="highlight"><pre><code class="language-" data-lang="">tar -xf 
$DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/
+</code></pre></div>
 <ol>
-  <li>Copy the default configuration file</li>
+<li>Copy the default configuration file</li>
 </ol>
-
-<p><code class="highlighter-rouge">
-cp zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg
-</code></p>
-
+<div class="highlight"><pre><code class="language-" data-lang="">cp 
zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg
+</code></pre></div>
 <ol>
-  <li>Start the single-node cluster</li>
+<li>Start the single-node cluster</li>
 </ol>
-
-<p><code class="highlighter-rouge">
-~/zookeeper-3.4.6/bin/zkServer.sh start
-</code></p>
-
+<div class="highlight"><pre><code class="language-" 
data-lang="">~/zookeeper-3.4.6/bin/zkServer.sh start
+</code></pre></div>
 <h3 id="kafka">Kafka</h3>
-<p>Kafka is a distributed, partitioned, replicated commit log service which 
Samza uses as its default messaging system.</p>
+
+<p>Kafka is a distributed, partitioned, replicated commit log service which 
Samza uses as its default messaging system. </p>
 
 <ol>
-  <li>
-    <p>Download a binary release of Kafka <a 
href="http://kafka.apache.org/downloads.html";>here</a>. As mentioned in the 
page, the Scala version does not matter. However, 2.10 is recommended as Samza 
has recently been moved to Scala 2.10.</p>
-  </li>
-  <li>
-    <p>Untar the archive</p>
-  </li>
+<li><p>Download a binary release of Kafka <a 
href="http://kafka.apache.org/downloads.html";>here</a>. As mentioned in the 
page, the Scala version does not matter. However, 2.10 is recommended as Samza 
has recently been moved to Scala 2.10.</p></li>
+<li><p>Untar the archive </p></li>
 </ol>
-
-<p><code class="highlighter-rouge">
-tar -xzf $DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/
-</code></p>
-
+<div class="highlight"><pre><code class="language-" data-lang="">tar -xzf 
$DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/
+</code></pre></div>
 <p>If you are running in local mode or a single-node cluster, you can now 
start Kafka with the command:</p>
-
-<p><code class="highlighter-rouge">
-~/kafka_2.10-0.8.1/bin/kafka-server-start.sh 
kafka_2.10-0.8.1/config/server.properties
-</code></p>
-
-<p>In multi-node cluster, it is typical and convenient to have a Kafka broker 
on each node (although you can totally have a smaller Kafka cluster, or even a 
single-node Kafka cluster). The number of brokers in Kafka cluster will affect 
disk bandwidth and space (the more brokers we have, the higher value we will 
get for the two). In each node, you need to set the following properties in 
<code 
class="highlighter-rouge">~/kafka_2.10-0.8.1/config/server.properties</code> 
before starting Kafka service.</p>
-
-<p><code class="highlighter-rouge">
-broker.id=a-unique-number-for-each-node
+<div class="highlight"><pre><code class="language-" 
data-lang="">~/kafka_2.10-0.8.1/bin/kafka-server-start.sh 
kafka_2.10-0.8.1/config/server.properties
+</code></pre></div>
+<p>In multi-node cluster, it is typical and convenient to have a Kafka broker 
on each node (although you can totally have a smaller Kafka cluster, or even a 
single-node Kafka cluster). The number of brokers in Kafka cluster will affect 
disk bandwidth and space (the more brokers we have, the higher value we will 
get for the two). In each node, you need to set the following properties in 
<code>~/kafka_2.10-0.8.1/config/server.properties</code> before starting Kafka 
service.</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">broker.id=a-unique-number-for-each-node
 zookeeper.connect=zookeeper-host0-url:2181[,zookeeper-host1-url:2181,...]
-</code></p>
-
+</code></pre></div>
 <p>You might want to change the retention hours or retention bytes of the logs 
to avoid the logs size from growing too big.</p>
-
-<p><code class="highlighter-rouge">
-log.retention.hours=number-of-hours-to-keep-the-logs
+<div class="highlight"><pre><code class="language-" 
data-lang="">log.retention.hours=number-of-hours-to-keep-the-logs
 log.retention.bytes=number-of-bytes-to-keep-in-the-logs
-</code></p>
-
+</code></pre></div>
 <h3 id="hadoop-yarn-and-hdfs">Hadoop YARN and HDFS</h3>
+
 <blockquote>
-  <p>Hadoop YARN and HDFS are <strong>not</strong> required to run SAMOA in 
Samza local mode.</p>
+<p>Hadoop YARN and HDFS are <strong>not</strong> required to run SAMOA in 
Samza local mode. </p>
 </blockquote>
 
 <p>To set up a YARN cluster, first download a binary release of Hadoop <a 
href="http://www.apache.org/dyn/closer.cgi/hadoop/common/";>here</a> on each 
node in the cluster and untar the archive
-<code class="highlighter-rouge">tar -xf $DOWNLOAD_DIR/hadoop-2.2.0.tar.gz -C 
~/</code>. We have tested SAMOA with Hadoop 2.2.0 but Hadoop 2.3.0 should work 
too.</p>
+<code>tar -xf $DOWNLOAD_DIR/hadoop-2.2.0.tar.gz -C ~/</code>. We have tested 
SAMOA with Hadoop 2.2.0 but Hadoop 2.3.0 should work too.</p>
 
 <p><strong>HDFS</strong></p>
 
-<p>Set the following properties in <code 
class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml</code> in all 
nodes.</p>
-
-<p>```</p>
-<configuration>
-  <property>
-    <name>dfs.datanode.data.dir</name>
-    <value>file:///home/username/hadoop-2.2.0/hdfs/datanode</value>
-    <description>Comma separated list of paths on the local filesystem of a 
DataNode where it should store its blocks.</description>
-  </property>
- 
-  <property>
-    <name>dfs.namenode.name.dir</name>
-    <value>file:///home/username/hadoop-2.2.0/hdfs/namenode</value>
-    <description>Path on the local filesystem where the NameNode stores the 
namespace and transaction logs persistently.</description>
-  </property>
-</configuration>
-<p>```</p>
-
-<p>Add this property in <code 
class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop/core-site.xml</code> in all 
nodes.</p>
-
-<p>```</p>
-<configuration>
-  <property>
-    <name>fs.defaultFS</name>
-    <value>hdfs://localhost:9000/</value>
-    <description>NameNode URI</description>
-  </property>
-
-  <property>
-    <name>fs.hdfs.impl</name>
-    <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
-  </property>
-</configuration>
-<p>```
-For a multi-node cluster, change the hostname (âlocalhostâ) to the correct 
host name of your namenode server.</p>
+<p>Set the following properties in 
<code>~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml</code> in all nodes.</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">&lt;configuration&gt;
+  &lt;property&gt;
+    &lt;name&gt;dfs.datanode.data.dir&lt;/name&gt;
+    &lt;value&gt;file:///home/username/hadoop-2.2.0/hdfs/datanode&lt;/value&gt;
+    &lt;description&gt;Comma separated list of paths on the local filesystem 
of a DataNode where it should store its blocks.&lt;/description&gt;
+  &lt;/property&gt;
+
+  &lt;property&gt;
+    &lt;name&gt;dfs.namenode.name.dir&lt;/name&gt;
+    &lt;value&gt;file:///home/username/hadoop-2.2.0/hdfs/namenode&lt;/value&gt;
+    &lt;description&gt;Path on the local filesystem where the NameNode stores 
the namespace and transaction logs persistently.&lt;/description&gt;
+  &lt;/property&gt;
+&lt;/configuration&gt;
+</code></pre></div>
+<p>Add this property in <code>~/hadoop-2.2.0/etc/hadoop/core-site.xml</code> 
in all nodes.</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">&lt;configuration&gt;
+  &lt;property&gt;
+    &lt;name&gt;fs.defaultFS&lt;/name&gt;
+    &lt;value&gt;hdfs://localhost:9000/&lt;/value&gt;
+    &lt;description&gt;NameNode URI&lt;/description&gt;
+  &lt;/property&gt;
+
+  &lt;property&gt;
+    &lt;name&gt;fs.hdfs.impl&lt;/name&gt;
+    &lt;value&gt;org.apache.hadoop.hdfs.DistributedFileSystem&lt;/value&gt;
+  &lt;/property&gt;
+&lt;/configuration&gt;
+</code></pre></div>
+<p>For a multi-node cluster, change the hostname (&quot;localhost&quot;) to 
the correct host name of your namenode server.</p>
 
 <p>Format HDFS directory (only perform this if you are running it for the very 
first time)</p>
-
-<p><code class="highlighter-rouge">
-~/hadoop-2.2.0/bin/hdfs namenode -format
-</code></p>
-
+<div class="highlight"><pre><code class="language-" 
data-lang="">~/hadoop-2.2.0/bin/hdfs namenode -format
+</code></pre></div>
 <p>Start namenode daemon on one of the node</p>
-
-<p><code class="highlighter-rouge">
-~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode
-</code></p>
-
+<div class="highlight"><pre><code class="language-" 
data-lang="">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode
+</code></pre></div>
 <p>Start datanode daemon on all nodes</p>
-
-<p><code class="highlighter-rouge">
-~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode
-</code></p>
-
+<div class="highlight"><pre><code class="language-" 
data-lang="">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode
+</code></pre></div>
 <p><strong>YARN</strong></p>
 
-<p>If you are running in multi-node cluster, set the resource manager hostname 
in <code 
class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop/yarn-site.xml</code> in all 
nodes as follow:</p>
-
-<p>```</p>
-<configuration>
-  <property>
-    <name>yarn.resourcemanager.hostname</name>
-    <value>resourcemanager-url</value>
-    <description>The hostname of the RM.</description>
-  </property>
-</configuration>
-<p>```</p>
-
+<p>If you are running in multi-node cluster, set the resource manager hostname 
in <code>~/hadoop-2.2.0/etc/hadoop/yarn-site.xml</code> in all nodes as 
follow:</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">&lt;configuration&gt;
+  &lt;property&gt;
+    &lt;name&gt;yarn.resourcemanager.hostname&lt;/name&gt;
+    &lt;value&gt;resourcemanager-url&lt;/value&gt;
+    &lt;description&gt;The hostname of the RM.&lt;/description&gt;
+  &lt;/property&gt;
+&lt;/configuration&gt;
+</code></pre></div>
 <p><strong>Other configurations</strong>
 Now we need to tell Samza where to find the configuration of YARN cluster. To 
do this, first create a new directory in all nodes:</p>
-
-<p><code class="highlighter-rouge">
-mkdir ~/.samza
+<div class="highlight"><pre><code class="language-" data-lang="">mkdir ~/.samza
 mkdir ~/.samza/conf
-</code></p>
-
-<p>Copy (or soft link) <code class="highlighter-rouge">core-site.xml</code>, 
<code class="highlighter-rouge">hdfs-site.xml</code>, <code 
class="highlighter-rouge">yarn-site.xml</code> in <code 
class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop</code> to the new 
directory</p>
-
-<p><code class="highlighter-rouge">
-ln -s ~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml
+</code></pre></div>
+<p>Copy (or soft link) <code>core-site.xml</code>, <code>hdfs-site.xml</code>, 
<code>yarn-site.xml</code> in <code>~/hadoop-2.2.0/etc/hadoop</code> to the new 
directory </p>
+<div class="highlight"><pre><code class="language-" data-lang="">ln -s 
~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml
 ln -s ~/.samza/conf/hdfs-site.xml ~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
 ln -s ~/.samza/conf/yarn-site.xml ~/hadoop-2.2.0/etc/hadoop/yarn-site.xml
-</code></p>
-
+</code></pre></div>
 <p>Export the enviroment variable YARN_HOME (in ~/.bashrc) so Samza knows 
where to find these YARN configuration files.</p>
-
-<p><code class="highlighter-rouge">
-export YARN_HOME=$HOME/.samza
-</code></p>
-
+<div class="highlight"><pre><code class="language-" data-lang="">export 
YARN_HOME=$HOME/.samza
+</code></pre></div>
 <p><strong>Start the YARN cluster</strong>
 Start resource manager on master node</p>
-
-<p><code class="highlighter-rouge">
-~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager
-</code></p>
-
+<div class="highlight"><pre><code class="language-" 
data-lang="">~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager
+</code></pre></div>
 <p>Start node manager on all worker nodes</p>
-
-<p><code class="highlighter-rouge">
-~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager
-</code></p>
-
+<div class="highlight"><pre><code class="language-" 
data-lang="">~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager
+</code></pre></div>
 <h2 id="build-samoa">Build SAMOA</h2>
+
 <p>Perform the following step on one of the node in the cluster. Here we 
assume git and maven are installed on this node.</p>
 
 <p>Since Samza is not yet released on Maven, we will have to clone Samza 
project, build and publish to Maven local repository:</p>
-
-<p><code class="highlighter-rouge">
-git clone -b 0.7.0 https://github.com/apache/incubator-samza.git
+<div class="highlight"><pre><code class="language-" data-lang="">git clone -b 
0.7.0 https://github.com/apache/incubator-samza.git
 cd incubator-samza
 ./gradlew clean build
 ./gradlew publishToMavenLocal
-</code></p>
-
-<p>Here we cloned and installed Samza version 0.7.0, the current released 
version (July 2014).</p>
+</code></pre></div>
+<p>Here we cloned and installed Samza version 0.7.0, the current released 
version (July 2014). </p>
 
 <p>Now we can clone the repository and install SAMOA.</p>
-
-<p><code class="highlighter-rouge">
-git clone http://git.apache.org/incubator-samoa.git
+<div class="highlight"><pre><code class="language-" data-lang="">git clone 
http://git.apache.org/incubator-samoa.git
 cd incubator-samoa
 mvn -Psamza package
-</code></p>
-
-<p>The deployable jars for SAMOA will be in <code 
class="highlighter-rouge">target/SAMOA-&lt;variant&gt;-&lt;version&gt;-SNAPSHOT.jar</code>.
 For example, in our case for Samza <code 
class="highlighter-rouge">target/SAMOA-Samza-0.2.0-SNAPSHOT.jar</code>.</p>
+</code></pre></div>
+<p>The deployable jars for SAMOA will be in 
<code>target/SAMOA-&lt;variant&gt;-&lt;version&gt;-SNAPSHOT.jar</code>. For 
example, in our case for Samza 
<code>target/SAMOA-Samza-0.2.0-SNAPSHOT.jar</code>.</p>
 
 <h2 id="configure-samoa-samza-execution">Configure SAMOA-Samza execution</h2>
-<p>This section explains the configuration parameters in <code 
class="highlighter-rouge">bin/samoa-samza.properties</code> that are required 
to run SAMOA on top of Samza.</p>
 
-<p><strong>Samza execution mode</strong></p>
+<p>This section explains the configuration parameters in 
<code>bin/samoa-samza.properties</code> that are required to run SAMOA on top 
of Samza.</p>
 
-<p><code class="highlighter-rouge">
-samoa.samza.mode=[yarn|local]
-</code>
-This parameter specify which mode to execute the task: <code 
class="highlighter-rouge">local</code> for local execution and <code 
class="highlighter-rouge">yarn</code> for cluster execution.</p>
+<p><strong>Samza execution mode</strong></p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">samoa.samza.mode=[yarn|local]
+</code></pre></div>
+<p>This parameter specify which mode to execute the task: <code>local</code> 
for local execution and <code>yarn</code> for cluster execution.</p>
 
 <p><strong>Zookeeper</strong></p>
-
-<p><code class="highlighter-rouge">
-zookeeper.connect=localhost
+<div class="highlight"><pre><code class="language-" 
data-lang="">zookeeper.connect=localhost
 zookeeper.port=2181
-</code>
-The default setting above applies for local mode execution. For cluster mode, 
change <code class="highlighter-rouge">zookeeper.host</code> to the correct URL 
of your zookeeper host.</p>
+</code></pre></div>
+<p>The default setting above applies for local mode execution. For cluster 
mode, change <code>zookeeper.host</code> to the correct URL of your zookeeper 
host.</p>
 
 <p><strong>Kafka</strong></p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">kafka.broker.list=localhost:9092
+</code></pre></div>
+<p><code>kafka.broker.list</code> is a comma separated list of host:port of 
all the brokers in Kafka cluster.</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">kafka.replication.factor=1
+</code></pre></div>
+<p><code>kafka.replication.factor</code> specifies the number of replicas for 
each stream in Kafka. This number must be less than or equal to the number of 
brokers in Kafka cluster.</p>
 
-<p><code class="highlighter-rouge">
-kafka.broker.list=localhost:9092
-</code>
-<code class="highlighter-rouge">kafka.broker.list</code> is a comma separated 
list of host:port of all the brokers in Kafka cluster.</p>
-
-<p><code class="highlighter-rouge">
-kafka.replication.factor=1
-</code>
-<code class="highlighter-rouge">kafka.replication.factor</code> specifies the 
number of replicas for each stream in Kafka. This number must be less than or 
equal to the number of brokers in Kafka cluster.</p>
-
-<p><strong>YARN</strong>
-&gt; The below settings do not apply for local mode execution, you can leave 
them as they are.</p>
+<p><strong>YARN</strong></p>
 
-<p><code class="highlighter-rouge">yarn.am.memory</code> and <code 
class="highlighter-rouge">yarn.container.memory</code> specify the memory 
requirement for the Application Master container and the worker containers, 
respectively.</p>
+<blockquote>
+<p>The below settings do not apply for local mode execution, you can leave 
them as they are.</p>
+</blockquote>
 
-<p><code class="highlighter-rouge">
-yarn.am.memory=1024
+<p><code>yarn.am.memory</code> and <code>yarn.container.memory</code> specify 
the memory requirement for the Application Master container and the worker 
containers, respectively. </p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">yarn.am.memory=1024
 yarn.container.memory=1024
-</code></p>
-
-<p><code class="highlighter-rouge">yarn.package.path</code> specifies the path 
(typically a HDFS path) of the package to be distributed to all YARN containers 
to execute the task.</p>
-
-<p><code class="highlighter-rouge">
-yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar
-</code></p>
-
+</code></pre></div>
+<p><code>yarn.package.path</code> specifies the path (typically a HDFS path) 
of the package to be distributed to all YARN containers to execute the task.</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar
+</code></pre></div>
 <p><strong>Samza</strong>
-<code class="highlighter-rouge">max.pi.per.container</code> specifies the 
number of PI instances allowed in one YARN container.</p>
-
-<p><code class="highlighter-rouge">
-max.pi.per.container=1
-</code></p>
-
-<p><code class="highlighter-rouge">kryo.register.file</code> specifies the 
registration file for Kryo serializer.</p>
-
-<p><code class="highlighter-rouge">
-kryo.register.file=samza-kryo
-</code></p>
-
-<p><code class="highlighter-rouge">checkpoint.commit.ms</code> specifies the 
frequency for PIs to commit their checkpoints (in ms). The default value is 1 
minute.</p>
-
-<p><code class="highlighter-rouge">
-checkpoint.commit.ms=60000
-</code></p>
-
+<code>max.pi.per.container</code> specifies the number of PI instances allowed 
in one YARN container. </p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">max.pi.per.container=1
+</code></pre></div>
+<p><code>kryo.register.file</code> specifies the registration file for Kryo 
serializer.</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">kryo.register.file=samza-kryo
+</code></pre></div>
+<p><code>checkpoint.commit.ms</code> specifies the frequency for PIs to commit 
their checkpoints (in ms). The default value is 1 minute.</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">checkpoint.commit.ms=60000
+</code></pre></div>
 <h2 id="deploy-samoa-samza-task">Deploy SAMOA-Samza task</h2>
-<p>Execute SAMOA task with the following command:</p>
-
-<p><code class="highlighter-rouge">
-bin/samoa samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar "&lt;task&gt; &amp; 
&lt;options&gt;" 
-</code></p>
 
+<p>Execute SAMOA task with the following command:</p>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar "&lt;task&gt; &amp; 
&lt;options&gt;" 
+</code></pre></div>
 <h2 id="observe-execution-and-result">Observe execution and result</h2>
-<p>In local mode, all the log will be printed out to stdout. If you execute 
the task on YARN cluster, the output is written to stdout files in YARNâs 
containersâ log folder 
($HADOOP_HOME/logs/userlogs/application_&lt;application-id&gt;/container_&lt;container-id&gt;).</p>
+
+<p>In local mode, all the log will be printed out to stdout. If you execute 
the task on YARN cluster, the output is written to stdout files in YARN&#39;s 
containers&#39; log folder 
($HADOOP_HOME/logs/userlogs/application_&lt;application-id&gt;/container_&lt;container-id&gt;).</p>
 
   </article>


Modified: 
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html 
(original)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html 
Sun Sep 25 20:39:59 2016
@@ -76,104 +76,103 @@
     <p>In this tutorial page we describe how to execute SAMOA on top of Apache 
Storm. Here is an outline of what we want to do:</p>
 
 <ol>
-  <li>Ensure that you have necessary Storm cluster and configuration to 
execute SAMOA</li>
-  <li>Ensure that you have all the SAMOA deployables for execution in the 
cluster</li>
-  <li>Configure samoa-storm.properties</li>
-  <li>Execute SAMOA classification task</li>
-  <li>Observe the task execution</li>
+<li>Ensure that you have necessary Storm cluster and configuration to execute 
SAMOA</li>
+<li>Ensure that you have all the SAMOA deployables for execution in the 
cluster</li>
+<li>Configure samoa-storm.properties</li>
+<li>Execute SAMOA classification task</li>
+<li>Observe the task execution</li>
 </ol>
 
 <h3 id="storm-configuration">Storm Configuration</h3>
-<p>Before we start the tutorial, please ensure that you already have Storm 
cluster (preferably Storm 0.8.2) running. You can follow this <a 
href="http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/";>tutorial</a>
 to set up a Storm cluster.</p>
 
-<p>You also need to install Storm at the machine where you initiate the 
deployment, and configure Storm (at least) with this configuration in <code 
class="highlighter-rouge">~/.storm/storm.yaml</code>:</p>
+<p>Before we start the tutorial, please ensure that you already have Storm 
cluster (preferably Storm 0.8.2) running. You can follow this <a 
href="http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/";>tutorial</a>
 to set up a Storm cluster.</p>
 
-<p>```
-########### These MUST be filled in for a storm configuration
-nimbus.host: â<enter your="" nimbus="" host="" name="" here="">"</enter></p>
+<p>You also need to install Storm at the machine where you initiate the 
deployment, and configure Storm (at least) with this configuration in 
<code>~/.storm/storm.yaml</code>:</p>
+<div class="highlight"><pre><code class="language-" data-lang="">########### 
These MUST be filled in for a storm configuration
+nimbus.host: "&lt;enter your nimbus host name here&gt;"
 
-<h2 id="list-of-custom-serializations">List of custom serializations</h2>
-<p>kryo.register:
+## List of custom serializations
+kryo.register:
     - org.apache.samoa.learners.classifiers.trees.AttributeContentEvent: 
org.apache.samoa.learners.classifiers.trees.AttributeContentEvent$AttributeCEFullPrecSerializer
     - org.apache.samoa.learners.classifiers.trees.ComputeContentEvent: 
org.apache.samoa.learners.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer
-<code class="highlighter-rouge">
-&lt;!--
+</code></pre></div>
+<!--
 Or, if you are using SAMOA with optimized VHT, you should use this following 
configuration file:
-</code>
+```
 ########### These MUST be filled in for a storm configuration
-nimbus.host: â<enter your="" nimbus="" host="" name="" here="">"</enter></p>
+nimbus.host: "<enter your nimbus host name here>"
 
-<h2 id="list-of-custom-serializations-1">List of custom serializations</h2>
-<p>kryo.register:
+## List of custom serializations
+kryo.register:
      - org.apache.samoa.learners.classifiers.trees.NaiveAttributeContentEvent: 
org.apache.samoa.classifiers.trees.NaiveAttributeContentEvent$NaiveAttributeCEFullPrecSerializer
      - org.apache.samoa.learners.classifiers.trees.ComputeContentEvent: 
org.apache.samoa.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer
 ```
-â&gt;</p>
+-->
 
-<p>Alternatively, if you donât have Storm cluster running, you can execute 
SAMOA with Storm in local mode as explained in section <a 
href="#samoa-storm-properties">samoa-storm.properties Configuration</a>.</p>
+<p>Alternatively, if you don&#39;t have Storm cluster running, you can execute 
SAMOA with Storm in local mode as explained in section <a 
href="#samoa-storm-properties">samoa-storm.properties Configuration</a>.</p>
 
 <h3 id="samoa-deployables">SAMOA deployables</h3>
+
 <p>There are three deployables for executing SAMOA on top of Storm. They 
are:</p>
 
 <ol>
-  <li><code class="highlighter-rouge">bin/samoa</code> is the main script to 
execute SAMOA. You do not need to change anything in this script.</li>
-  <li><code 
class="highlighter-rouge">target/SAMOA-Storm-x.x.x-SNAPSHOT.jar</code> is the 
deployed jar file. <code class="highlighter-rouge">x.x.x</code> is the version 
number of SAMOA.</li>
-  <li><code class="highlighter-rouge">bin/samoa-storm.properties</code> 
contains deployment configurations. You need to set the parameters in this 
properties file correctly.</li>
+<li><code>bin/samoa</code> is the main script to execute SAMOA. You do not 
need to change anything in this script.</li>
+<li><code>target/SAMOA-Storm-x.x.x-SNAPSHOT.jar</code> is the deployed jar 
file. <code>x.x.x</code> is the version number of SAMOA. </li>
+<li><code>bin/samoa-storm.properties</code> contains deployment 
configurations. You need to set the parameters in this properties file 
correctly. </li>
 </ol>
 
-<h3 id="a-namesamoa-storm-properties-samoa-stormproperties-configurationa"><a 
name="samoa-storm-properties"> samoa-storm.properties Configuration</a></h3>
+<h3 id="samoa-storm-properties-configuration"><a 
name="samoa-storm-properties"> samoa-storm.properties Configuration</a></h3>
+
 <p>Currently, the properties file contains two configurations:</p>
 
 <ol>
-  <li><code class="highlighter-rouge">samoa.storm.mode</code> determines 
whether the task is executed locally (using Stormâs <code 
class="highlighter-rouge">LocalCluster</code>) or executed in a Storm cluster. 
Use <code class="highlighter-rouge">local</code> if you want to test SAMOA and 
you do not have a Storm cluster for deployment. Use <code 
class="highlighter-rouge">cluster</code> if you want to test SAMOA on your 
Storm cluster.</li>
-  <li><code class="highlighter-rouge">samoa.storm.numworker</code> determines 
the number of worker to execute the SAMOA tasks in the Storm cluster. This 
field must be an integer, less than or equal to the number of available slots 
in you Storm cluster. If you are using local mode, this property corresponds to 
the number of thread used by Stormâs LocalCluster to execute your SAMOA 
task.</li>
+<li><code>samoa.storm.mode</code> determines whether the task is executed 
locally (using Storm&#39;s <code>LocalCluster</code>) or executed in a Storm 
cluster. Use <code>local</code> if you want to test SAMOA and you do not have a 
Storm cluster for deployment. Use <code>cluster</code> if you want to test 
SAMOA on your Storm cluster.</li>
+<li><code>samoa.storm.numworker</code> determines the number of worker to 
execute the SAMOA tasks in the Storm cluster. This field must be an integer, 
less than or equal to the number of available slots in you Storm cluster. If 
you are using local mode, this property corresponds to the number of thread 
used by Storm&#39;s LocalCluster to execute your SAMOA task.</li>
 </ol>
 
 <p>Here is the example of a complete properties file:</p>
-
-<p>```
-# SAMOA Storm properties file
+<div class="highlight"><pre><code class="language-" data-lang=""># SAMOA Storm 
properties file
 # This file contains specific configurations for SAMOA deployment in the Storm 
platform
 # Note that you still need to configure Storm client in your machine, 
-# including setting up Storm configuration file (~/.storm/storm.yaml) with 
correct settings</p>
+# including setting up Storm configuration file (~/.storm/storm.yaml) with 
correct settings
 
-<h1 
id="samoastormmode-corresponds-to-the-execution-mode-of-the-task-in-storm">samoa.storm.mode
 corresponds to the execution mode of the Task in Storm</h1>
-<p># possible values:
+# samoa.storm.mode corresponds to the execution mode of the Task in Storm 
+# possible values:
 #   1. cluster: the Task will be sent into nimbus. The nimbus is configured by 
Storm configuration file
 #   2. local: the Task will be sent using local Storm cluster
-samoa.storm.mode=cluster</p>
+samoa.storm.mode=cluster
 
-<h1 
id="samoastormnumworker-corresponds-to-the-number-of-worker-processes-allocated-in-storm-cluster">samoa.storm.numworker
 corresponds to the number of worker processes allocated in Storm cluster</h1>
-<p># possible values: any integer greater than 0<br />
+# samoa.storm.numworker corresponds to the number of worker processes 
allocated in Storm cluster
+# possible values: any integer greater than 0  
 samoa.storm.numworker=7
-```</p>
-
+</code></pre></div>
 <h3 id="samoa-task-execution">SAMOA task execution</h3>
 
-<p>You can execute a SAMOA task using the aforementioned <code 
class="highlighter-rouge">bin/samoa</code> script with this following format:
-<code class="highlighter-rouge">bin/samoa &lt;platform&gt; &lt;jar&gt; 
"&lt;task&gt;"</code>.</p>
+<p>You can execute a SAMOA task using the aforementioned 
<code>bin/samoa</code> script with this following format:
+<code>bin/samoa &lt;platform&gt; &lt;jar&gt; 
&quot;&lt;task&gt;&quot;</code>.</p>
 
-<p><code class="highlighter-rouge">&lt;platform&gt;</code> can be <code 
class="highlighter-rouge">storm</code> or <code 
class="highlighter-rouge">s4</code>. Using <code 
class="highlighter-rouge">storm</code> option means you are deploying SAMOA on 
a Storm environment. In this configuration, the script uses the aforementioned 
yaml file (<code class="highlighter-rouge">~/.storm/storm.yaml</code>) and 
<code class="highlighter-rouge">samoa-storm.properties</code> to perform the 
deployment. Using <code class="highlighter-rouge">s4</code> option means you 
are deploying SAMOA on an Apache S4 environment. Follow this <a 
href="Executing-SAMOA-with-Apache-S4">link</a> to learn more about deploying 
SAMOA on Apache S4.</p>
+<p><code>&lt;platform&gt;</code> can be <code>storm</code> or <code>s4</code>. 
Using <code>storm</code> option means you are deploying SAMOA on a Storm 
environment. In this configuration, the script uses the aforementioned yaml 
file (<code>~/.storm/storm.yaml</code>) and <code>samoa-storm.properties</code> 
to perform the deployment. Using <code>s4</code> option means you are deploying 
SAMOA on an Apache S4 environment. Follow this <a 
href="Executing-SAMOA-with-Apache-S4">link</a> to learn more about deploying 
SAMOA on Apache S4.</p>
 
-<p><code class="highlighter-rouge">&lt;jar&gt;</code> is the location of the 
deployed jar file (<code 
class="highlighter-rouge">SAMOA-Storm-x.x.x-SNAPSHOT.jar</code>) in your file 
system. The location can be a relative path or an absolute path into the jar 
file.</p>
+<p><code>&lt;jar&gt;</code> is the location of the deployed jar file 
(<code>SAMOA-Storm-x.x.x-SNAPSHOT.jar</code>) in your file system. The location 
can be a relative path or an absolute path into the jar file. </p>
 
-<p><code class="highlighter-rouge">"&lt;task&gt;"</code> is the SAMOA task 
command line such as <code 
class="highlighter-rouge">PrequentialEvaluation</code> or <code 
class="highlighter-rouge">ClusteringTask</code>. This command line for SAMOA 
task follows the format of <a 
href="http://moa.cms.waikato.ac.nz/details/classification/command-line/";>Massive
 Online Analysis (MOA)</a>.</p>
+<p><code>&quot;&lt;task&gt;&quot;</code> is the SAMOA task command line such 
as <code>PrequentialEvaluation</code> or <code>ClusteringTask</code>. This 
command line for SAMOA task follows the format of <a 
href="http://moa.cms.waikato.ac.nz/details/classification/command-line/";>Massive
 Online Analysis (MOA)</a>.</p>
 
 <p>The complete command to execute SAMOA is:</p>
-
-<p><code class="highlighter-rouge">
-bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation 
-d /tmp/dump.csv -i 1000000 -f 100000 -l 
(org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s 
(org.apache.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u 10)"
-</code>
-The example above uses <a href="Prequential-Evaluation-Task">Prequential 
Evaluation task</a> and <a href="Vertical-Hoeffding-Tree-Classifier">Vertical 
Hoeffding Tree</a> classifier.</p>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d 
/tmp/dump.csv -i 1000000 -f 100000 -l 
(org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s 
(org.apache.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u 10)"
+</code></pre></div>
+<p>The example above uses <a href="Prequential-Evaluation-Task">Prequential 
Evaluation task</a> and <a href="Vertical-Hoeffding-Tree-Classifier">Vertical 
Hoeffding Tree</a> classifier. </p>
 
 <h3 id="observing-task-execution">Observing task execution</h3>
-<p>There are two ways to observe the task execution using Storm UI and by 
monitoring the dump file of the SAMOA task. Notice that the dump file will be 
created on the cluster if you are executing your task in <code 
class="highlighter-rouge">cluster</code> mode.</p>
+
+<p>There are two ways to observe the task execution using Storm UI and by 
monitoring the dump file of the SAMOA task. Notice that the dump file will be 
created on the cluster if you are executing your task in <code>cluster</code> 
mode.</p>
 
 <h4 id="using-storm-ui">Using Storm UI</h4>
+
 <p>Go to the web address of Storm UI and check whether the SAMOA task executes 
as intended. Use this UI to kill the associated Storm topology if necessary.</p>
 
 <h4 id="monitoring-the-dump-file">Monitoring the dump file</h4>
-<p>Several tasks have options to specify a dump file, which is a file that 
represents the task output. In our example, <a 
href="Prequential-Evaluation-Task">Prequential Evaluation task</a> has <code 
class="highlighter-rouge">-d</code> option which specifies the path to the dump 
file. Since Storm performs the allocation of Storm tasks, you should set the 
dump file into a file on a shared filesystem if you want to access it from the 
machine submitting the task.</p>
+
+<p>Several tasks have options to specify a dump file, which is a file that 
represents the task output. In our example, <a 
href="Prequential-Evaluation-Task">Prequential Evaluation task</a> has 
<code>-d</code> option which specifies the path to the dump file. Since Storm 
performs the allocation of Storm tasks, you should set the dump file into a 
file on a shared filesystem if you want to access it from the machine 
submitting the task.</p>
 
   </article>
 

Modified: incubator/samoa/site/documentation/Getting-Started.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Getting-Started.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Getting-Started.html (original)
+++ incubator/samoa/site/documentation/Getting-Started.html Sun Sep 25 20:39:59 
2016
@@ -76,40 +76,26 @@
     <p>We start showing how simple is to run a first large scale machine 
learning task in SAMOA. We will evaluate a bagging ensemble method using 
decision trees on the Forest Covertype dataset.</p>
 
 <ul>
-  <li>
-    <ol>
-      <li>Download SAMOA</li>
-    </ol>
-  </li>
+<li>1. Download SAMOA </li>
 </ul>
-
-<p><code class="highlighter-rouge">bash
-git clone http://git.apache.org/incubator-samoa.git
-cd incubator-samoa
-mvn package      #Local mode
-</code>
-* 2. Download the Forest CoverType dataset</p>
-
-<p><code class="highlighter-rouge">bash
-wget 
"http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip";
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">git 
clone http://git.apache.org/incubator-samoa.git
+<span class="nb">cd </span>incubator-samoa
+mvn package      <span class="c">#Local mode</span>
+</code></pre></div>
+<ul>
+<li>2. Download the Forest CoverType dataset </li>
+</ul>
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">wget 
<span 
class="s2">"http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip";</span>
 unzip covtypeNorm.arff.zip 
-</code></p>
-
+</code></pre></div>
 <p><em>Forest Covertype</em> contains the forest cover type for 30 x 30 meter 
cells obtained from the US Forest Service (USFS) Region 2 Resource Information 
System (RIS) data. It contains 581,012 instances and 54 attributes, and it has 
been used in several articles on data stream classification.</p>
 
 <ul>
-  <li>
-    <ol>
-      <li>Run an example: classifying the CoverType dataset with the bagging 
algorithm</li>
-    </ol>
-  </li>
+<li>3.  Run an example: classifying the CoverType dataset with the bagging 
algorithm</li>
 </ul>
-
-<p><code class="highlighter-rouge">bash
-bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar "PrequentialEvaluation 
-l classifiers.ensemble.Bagging 
-    -s (ArffFileStream -f covtypeNorm.arff) -f 100000"
-</code></p>
-
+<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">bin/samoa <span class="nb">local 
</span>target/SAMOA-Local-0.3.0-SNAPSHOT.jar <span 
class="s2">"PrequentialEvaluation -l classifiers.ensemble.Bagging 
+    -s (ArffFileStream -f covtypeNorm.arff) -f 100000"</span>
+</code></pre></div>
 <p>The output will be a list of the evaluation results, plotted each 100,000 
instances.</p>
 
   </article>

Modified: incubator/samoa/site/documentation/Home.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Home.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Home.html (original)
+++ incubator/samoa/site/documentation/Home.html Sun Sep 25 20:39:59 2016
@@ -81,62 +81,58 @@ SAMOA is similar to Mahout in spirit, bu
 <p>Apache SAMOA is simple and fun to use! This documentation is intended to 
give an introduction on how to use SAMOA in different ways. As a user you can 
run SAMOA algorithms on several stream processing engines: local mode, Storm, 
S4, Samza, and Flink. As a developer you can create new algorithms only once 
and test them in all of these distributed stream processing engines.</p>
 
 <h2 id="getting-started">Getting Started</h2>
+
 <ul>
-  <li><a href="Getting-Started.html">0 Hands-on with SAMOA: Getting 
Started!</a></li>
+<li><a href="Getting-Started.html">0 Hands-on with SAMOA: Getting 
Started!</a></li>
 </ul>
 
 <h2 id="users">Users</h2>
+
+<ul>
+<li><a href="Scalable-Advanced-Massive-Online-Analysis.html">1 Building and 
Executing SAMOA</a>
+
+<ul>
+<li><a href="Building-SAMOA.html">1.0 Building SAMOA</a></li>
+<li><a href="Executing-SAMOA-with-Apache-Storm.html">1.1 Executing SAMOA with 
Apache Storm</a></li>
+<li><a href="Executing-SAMOA-with-Apache-S4.html">1.2 Executing SAMOA with 
Apache S4</a></li>
+<li><a href="Executing-SAMOA-with-Apache-Samza.html">1.3 Executing SAMOA with 
Apache Samza</a></li>
+<li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">1.4 Executing SAMOA 
with Apache Avro Files</a></li>
+</ul></li>
+<li><a href="SAMOA-and-Machine-Learning.html">2 Machine Learning Methods in 
SAMOA</a>
+
 <ul>
-  <li><a href="Scalable-Advanced-Massive-Online-Analysis.html">1 Building and 
Executing SAMOA</a>
-    <ul>
-      <li><a href="Building-SAMOA.html">1.0 Building SAMOA</a></li>
-      <li><a href="Executing-SAMOA-with-Apache-Storm.html">1.1 Executing SAMOA 
with Apache Storm</a></li>
-      <li><a href="Executing-SAMOA-with-Apache-S4.html">1.2 Executing SAMOA 
with Apache S4</a></li>
-      <li><a href="Executing-SAMOA-with-Apache-Samza.html">1.3 Executing SAMOA 
with Apache Samza</a></li>
-      <li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">1.4 Executing 
SAMOA with Apache Avro Files</a></li>
-    </ul>
-  </li>
-  <li><a href="SAMOA-and-Machine-Learning.html">2 Machine Learning Methods in 
SAMOA</a>
-    <ul>
-      <li><a href="Prequential-Evaluation-Task.html">2.1 Prequential 
Evaluation Task</a></li>
-      <li><a href="Vertical-Hoeffding-Tree-Classifier.html">2.2 Vertical 
Hoeffding Tree Classifier</a></li>
-      <li><a href="Adaptive-Model-Rules-Regressor.html">2.3 Adaptive Model 
Rules Regressor</a></li>
-      <li><a href="Bagging-and-Boosting.html">2.4 Bagging and Boosting</a></li>
-      <li><a href="Distributed-Stream-Clustering.html">2.5 Distributed Stream 
Clustering</a></li>
-      <li><a href="Distributed-Stream-Frequent-Itemset-Mining.html">2.6 
Distributed Stream Frequent Itemset Mining</a></li>
-      <li><a href="SAMOA-for-MOA-users.html">2.7 SAMOA for MOA users</a></li>
-    </ul>
-  </li>
+<li><a href="Prequential-Evaluation-Task.html">2.1 Prequential Evaluation 
Task</a></li>
+<li><a href="Vertical-Hoeffding-Tree-Classifier.html">2.2 Vertical Hoeffding 
Tree Classifier</a></li>
+<li><a href="Adaptive-Model-Rules-Regressor.html">2.3 Adaptive Model Rules 
Regressor</a></li>
+<li><a href="Bagging-and-Boosting.html">2.4 Bagging and Boosting</a></li>
+<li><a href="Distributed-Stream-Clustering.html">2.5 Distributed Stream 
Clustering</a></li>
+<li><a href="Distributed-Stream-Frequent-Itemset-Mining.html">2.6 Distributed 
Stream Frequent Itemset Mining</a></li>
+<li><a href="SAMOA-for-MOA-users.html">2.7 SAMOA for MOA users</a></li>
+</ul></li>
 </ul>
 
 <h2 id="developers">Developers</h2>
+
 <ul>
-  <li><a href="SAMOA-Topology.html">3 Understanding SAMOA Topologies</a>
-    <ul>
-      <li><a href="Processor.html">3.1 Processor</a></li>
-      <li><a href="Content-Event.html">3.2 Content Event</a></li>
-      <li><a href="Stream.html">3.3 Stream</a></li>
-      <li><a href="Task.html">3.4 Task</a></li>
-      <li><a href="Topology-Builder.html">3.5 Topology Builder</a></li>
-      <li><a href="Learner.html">3.6 Learner</a></li>
-      <li><a href="Processing-Item.html">3.7 Processing Item</a></li>
-    </ul>
-  </li>
-  <li><a href="Developing-New-Tasks-in-SAMOA.html">4 Developing New Tasks in 
SAMOA</a></li>
+<li><a href="SAMOA-Topology.html">3 Understanding SAMOA Topologies</a>
+
+<ul>
+<li><a href="Processor.html">3.1 Processor</a></li>
+<li><a href="Content-Event.html">3.2 Content Event</a></li>
+<li><a href="Stream.html">3.3 Stream</a></li>
+<li><a href="Task.html">3.4 Task</a></li>
+<li><a href="Topology-Builder.html">3.5 Topology Builder</a></li>
+<li><a href="Learner.html">3.6 Learner</a></li>
+<li><a href="Processing-Item.html">3.7 Processing Item</a></li>
+</ul></li>
+<li><a href="Developing-New-Tasks-in-SAMOA.html">4 Developing New Tasks in 
SAMOA</a></li>
 </ul>
 
 <h3 id="getting-help">Getting help</h3>
-<p>Discussion about SAMOA happens on the Apache development mailing list <a 
href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#064;&#115;&#097;&#109;&#111;&#097;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#111;&#114;&#103;">&#100;&#101;&#118;&#064;&#115;&#097;&#109;&#111;&#097;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#111;&#114;&#103;</a></p>
 
-<table>
-  <tbody>
-    <tr>
-      <td>[ <a 
href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#045;&#115;&#117;&#098;&#115;&#099;&#114;&#105;&#098;&#101;&#064;&#115;&#097;&#109;&#111;&#097;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#111;&#114;&#103;">subscribe</a></td>
-      <td><a 
href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#045;&#117;&#110;&#115;&#117;&#098;&#115;&#099;&#114;&#105;&#098;&#101;&#064;&#115;&#097;&#109;&#111;&#097;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#111;&#114;&#103;">unsubscribe</a></td>
-      <td><a 
href="http://mail-archives.apache.org/mod_mbox/incubator-samoa-dev";>archives</a>
 ]</td>
-    </tr>
-  </tbody>
-</table>
+<p>Discussion about SAMOA happens on the Apache development mailing list <a 
href="mailto:[email protected]";>[email protected]</a></p>
+
+<p>[ <a href="mailto:[email protected]";>subscribe</a> | <a 
href="mailto:[email protected]";>unsubscribe</a> | <a 
href="http://mail-archives.apache.org/mod_mbox/incubator-samoa-dev";>archives</a>
 ]</p>
 
   </article>
 

Modified: incubator/samoa/site/documentation/Learner.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Learner.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Learner.html (original)
+++ incubator/samoa/site/documentation/Learner.html Sun Sep 25 20:39:59 2016
@@ -74,19 +74,18 @@
 
   <article class="post-content">
     <p>Learners are implemented in SAMOA as sub-topologies.</p>
+<div class="highlight"><pre><code class="language-" data-lang="">public 
interface Learner extends Serializable{
 
-<p>```
-public interface Learner extends Serializable{</p>
+    public void init(TopologyBuilder topologyBuilder, Instances dataset);
 
-<div class="highlighter-rouge"><pre class="highlight"><code>public void 
init(TopologyBuilder topologyBuilder, Instances dataset);
+    public Processor getInputProcessor();
 
-public Processor getInputProcessor();
+    public Stream getResultStream();
+}
+</code></pre></div>
+<p>When a <code>Task</code> object is initiated via <code>init()</code>, the 
method <code>init(...)</code> of <code>Learner</code> is called, and the 
topology is added to the global topology of the task.</p>
 
-public Stream getResultStream(); } ``` When a `Task` object is initiated via 
`init()`, the method `init(...)` of `Learner` is called, and the topology is 
added to the global topology of the task.
-</code></pre>
-</div>
-
-<p>To create a new learner, it is only needed to add streams, processors and 
their connections to the topology in <code 
class="highlighter-rouge">init(...)</code>, specify what is the processor that 
will manage the input stream of the learner in <code 
class="highlighter-rouge">getInputProcessor()</code>, and finally, specify what 
is going to be the output stream of the learner with <code 
class="highlighter-rouge">getResultStream()</code>.</p>
+<p>To create a new learner, it is only needed to add streams, processors and 
their connections to the topology in <code>init(...)</code>, specify what is 
the processor that will manage the input stream of the learner in 
<code>getInputProcessor()</code>, and finally, specify what is going to be the 
output stream of the learner with <code>getResultStream()</code>.</p>
 
   </article>
 

Modified: incubator/samoa/site/documentation/Prequential-Evaluation-Task.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Prequential-Evaluation-Task.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Prequential-Evaluation-Task.html 
(original)
+++ incubator/samoa/site/documentation/Prequential-Evaluation-Task.html Sun Sep 
25 20:39:59 2016
@@ -73,29 +73,26 @@
   </header>
 
   <article class="post-content">
-    <p>In data stream mining, the most used evaluation scheme is the 
prequential or interleaved-test-then-train evolution. The idea is very simple: 
we use each instance first to test the model, and then to train the model. The 
Prequential Evaluation task evaluates the performance of online classifiers 
doing this. It supports two classification performance evaluators: the basic 
one which measures the accuracy of the classifier model since the start of the 
evaluation, and a window based one which measures the accuracy on the current 
sliding window of recent instances.</p>
+    <p>In data stream mining, the most used evaluation scheme is the 
prequential or interleaved-test-then-train evolution. The idea is very simple: 
we use each instance first to test the model, and then to train the model. The 
Prequential Evaluation task evaluates the performance of online classifiers 
doing this. It supports two classification performance evaluators: the basic 
one which measures the accuracy of the classifier model since the start of the 
evaluation, and a window based one which measures the accuracy on the current 
sliding window of recent instances. </p>
 
 <p>Examples of Prequential Evaluation task in SAMOA command line when 
deploying into Storm</p>
-
-<p><code class="highlighter-rouge">
-bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation 
-d /tmp/dump.csv -i 1000000 -f 100000 -l 
(classifiers.trees.VerticalHoeffdingTree -p 4) -s 
(generators.RandomTreeGenerator -c 2 -o 10 -u 10)"
-</code></p>
-
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d 
/tmp/dump.csv -i 1000000 -f 100000 -l (classifiers.trees.VerticalHoeffdingTree 
-p 4) -s (generators.RandomTreeGenerator -c 2 -o 10 -u 10)"
+</code></pre></div>
 <p>Parameters:</p>
 
 <ul>
-  <li><code class="highlighter-rouge">-l</code>: classifier to train</li>
-  <li><code class="highlighter-rouge">-s</code>: stream to learn from</li>
-  <li><code class="highlighter-rouge">-e</code>: classification performance 
evaluation method</li>
-  <li><code class="highlighter-rouge">-i</code>: maximum number of instances 
to test/train on (-1 = no limit)</li>
-  <li><code class="highlighter-rouge">-f</code>: number of instances between 
samples of the learning performance</li>
-  <li><code class="highlighter-rouge">-n</code>: evaluation name (default: 
PrequentialEvaluation_TimeStamp)</li>
-  <li><code class="highlighter-rouge">-d</code>: file to append intermediate 
csv results to</li>
+<li><code>-l</code>: classifier to train</li>
+<li><code>-s</code>: stream to learn from</li>
+<li><code>-e</code>: classification performance evaluation method</li>
+<li><code>-i</code>: maximum number of instances to test/train on (-1 = no 
limit)</li>
+<li><code>-f</code>: number of instances between samples of the learning 
performance</li>
+<li><code>-n</code>: evaluation name (default: 
PrequentialEvaluation_TimeStamp)</li>
+<li><code>-d</code>: file to append intermediate csv results to</li>
 </ul>
 
-<p>In terms of SAMOA API, the Prequential Evaluation Task consists of a source 
<code class="highlighter-rouge">Entrance Processor</code>, a <code 
class="highlighter-rouge">Classifier</code>, and an <code 
class="highlighter-rouge">Evaluator Processor</code> as shown below. The <code 
class="highlighter-rouge">Entrance Processor</code> sends instances to the 
<code class="highlighter-rouge">Classifier</code> using the <code 
class="highlighter-rouge">source</code> stream. The classifier sends the 
classification results to the <code class="highlighter-rouge">Evaluator 
Processor</code> via the <code class="highlighter-rouge">result</code> stream. 
The <code class="highlighter-rouge">Entrance Processor</code> corresponds to 
the <code class="highlighter-rouge">-s</code> option of Prequential Evaluation, 
the <code class="highlighter-rouge">Classifier</code> corresponds to the <code 
class="highlighter-rouge">-l</code> option, and the <code 
class="highlighter-rouge">Evaluator Processor</code> co
 rresponds to the <code class="highlighter-rouge">-e</code> option.</p>
+<p>In terms of SAMOA API, the Prequential Evaluation Task consists of a source 
<code>Entrance Processor</code>, a <code>Classifier</code>, and an 
<code>Evaluator Processor</code> as shown below. The <code>Entrance 
Processor</code> sends instances to the <code>Classifier</code> using the 
<code>source</code> stream. The classifier sends the classification results to 
the <code>Evaluator Processor</code> via the <code>result</code> stream. The 
<code>Entrance Processor</code> corresponds to the <code>-s</code> option of 
Prequential Evaluation, the <code>Classifier</code> corresponds to the 
<code>-l</code> option, and the <code>Evaluator Processor</code> corresponds to 
the <code>-e</code> option.</p>
 
-<p><img src="images/PrequentialEvaluation.png" alt="Prequential Evaluation 
Task" /></p>
+<p><img src="images/PrequentialEvaluation.png" alt="Prequential Evaluation 
Task"></p>
 
   </article>
 

Modified: incubator/samoa/site/documentation/Processing-Item.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Processing-Item.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Processing-Item.html (original)
+++ incubator/samoa/site/documentation/Processing-Item.html Sun Sep 25 20:39:59 
2016
@@ -82,33 +82,30 @@ It is used internally, and it is not acc
 There are two types of Processing Items.</p>
 
 <ol>
-  <li>Simple Processing Item (PI)</li>
-  <li>Entrance Processing Item (EntrancePI)</li>
+<li>Simple Processing Item (PI)</li>
+<li>Entrance Processing Item (EntrancePI)</li>
 </ol>
 
-<h4 id="simple-processing-item-pi">1. Simple Processing Item (PI)</h4>
-<p>Once a Processor is wrapped in a PI, it becomes an executable component of 
the topology. All physical topology units are created with the help of a <code 
class="highlighter-rouge">TopologyBuilder</code>. Following code snippet shows 
the creation of a Processing Item.</p>
+<h4 id="1-simple-processing-item-pi">1. Simple Processing Item (PI)</h4>
 
-<p><code class="highlighter-rouge">
-builder.initTopology("MyTopology");
+<p>Once a Processor is wrapped in a PI, it becomes an executable component of 
the topology. All physical topology units are created with the help of a 
<code>TopologyBuilder</code>. Following code snippet shows the creation of a 
Processing Item.</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">builder.initTopology("MyTopology");
 Processor samplerProcessor = new Sampler();
 ProcessingItem samplerPI = builder.createPI(samplerProcessor,3);
-</code>
-The <code class="highlighter-rouge">createPI()</code> method of <code 
class="highlighter-rouge">TopologyBuilder</code> is used to create a PI. Its 
first argument is the instance of a Processor which needs to be wrapped-in. Its 
second argument is the parallelism hint. It tells the underlying platforms how 
many parallel instances of this PI should be created on different nodes.</p>
+</code></pre></div>
+<p>The <code>createPI()</code> method of <code>TopologyBuilder</code> is used 
to create a PI. Its first argument is the instance of a Processor which needs 
to be wrapped-in. Its second argument is the parallelism hint. It tells the 
underlying platforms how many parallel instances of this PI should be created 
on different nodes.</p>
+
+<h4 id="2-entrance-processing-item-entrancepi">2. Entrance Processing Item 
(EntrancePI)</h4>
 
-<h4 id="entrance-processing-item-entrancepi">2. Entrance Processing Item 
(EntrancePI)</h4>
 <p>Entrance Processing Item is different from a PI in only one way: it accepts 
an Entrance Processor which can generate its own stream.
 It is mostly used as the source of a topology.
 It connects to external sources, pulls data and provides it to the topology in 
the form of streams.
-All physical topology units are created with the help of a <code 
class="highlighter-rouge">TopologyBuilder</code>.
+All physical topology units are created with the help of a 
<code>TopologyBuilder</code>.
 The following code snippet shows the creation of an Entrance Processing 
Item.</p>
-
-<p><code class="highlighter-rouge">
-builder.initTopology("MyTopology");
+<div class="highlight"><pre><code class="language-" 
data-lang="">builder.initTopology("MyTopology");
 EntranceProcessor sourceProcessor = new Source();
 EntranceProcessingItem sourcePi = builder.createEntrancePi(sourceProcessor);
-</code></p>
-
+</code></pre></div>
   </article>
 
 <!-- </div> -->

Modified: incubator/samoa/site/documentation/Processor.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Processor.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Processor.html (original)
+++ incubator/samoa/site/documentation/Processor.html Sun Sep 25 20:39:59 2016
@@ -74,71 +74,79 @@
 
   <article class="post-content">
     <p>Processor is the basic logical processing unit. All logic is written in 
the processor. In SAMOA, a Processor is an interface. Users can implement this 
interface to build their own processors.
-<img src="images/Topology.png" alt="Topology" />
-### Adding a Processor to the topology</p>
+<img src="images/Topology.png" alt="Topology"></p>
+
+<h3 id="adding-a-processor-to-the-topology">Adding a Processor to the 
topology</h3>
 
 <p>There are two ways to add a processor to the topology.</p>
 
-<h4 id="processor">1. Processor</h4>
-<p>All physical topology units are created with the help of a <code 
class="highlighter-rouge">TopologyBuilder</code>. Following code snippet shows 
how to add a Processor to the topology.
-<code class="highlighter-rouge">
+<h4 id="1-processor">1. Processor</h4>
+
+<p>All physical topology units are created with the help of a 
<code>TopologyBuilder</code>. Following code snippet shows how to add a 
Processor to the topology.
+<code>
 Processor processor = new ExampleProcessor();
 builder.addProcessor(processor, paralellism);
 </code>
-<code class="highlighter-rouge">addProcessor()</code> method of <code 
class="highlighter-rouge">TopologyBuilder</code> is used to add the processor. 
Its first argument is the instance of a Processor which needs to be added. Its 
second argument is the parallelism hint. It tells the underlying platforms how 
many parallel instances of this processor should be created on different 
nodes.</p>
+<code>addProcessor()</code> method of <code>TopologyBuilder</code> is used to 
add the processor. Its first argument is the instance of a Processor which 
needs to be added. Its second argument is the parallelism hint. It tells the 
underlying platforms how many parallel instances of this processor should be 
created on different nodes.</p>
+
+<h4 id="2-entrance-processor">2. Entrance Processor</h4>
 
-<h4 id="entrance-processor">2. Entrance Processor</h4>
 <p>Some processors generates their own streams, and they are used as the 
source of a topology. They connect to external sources, pull data and provide 
it to the topology in the form of streams.
-All physical topology units are created with the help of a <code 
class="highlighter-rouge">TopologyBuilder</code>. The following code snippet 
shows how to add an entrance processor to the topology and create a stream from 
it.
-<code class="highlighter-rouge">
+All physical topology units are created with the help of a 
<code>TopologyBuilder</code>. The following code snippet shows how to add an 
entrance processor to the topology and create a stream from it.
+<code>
 EntranceProcessor entranceProcessor = new EntranceProcessor();
 builder.addEntranceProcessor(entranceProcessor);
 Stream source = builder.createStream(entranceProcessor);
 </code></p>
 
 <h3 id="preview-of-processor">Preview of Processor</h3>
-<p><code class="highlighter-rouge">
-package samoa.core;
+<div class="highlight"><pre><code class="language-" data-lang="">package 
samoa.core;
 public interface Processor extends java.io.Serializable{
-       boolean process(ContentEvent event);
-       void onCreate(int id);
-       Processor newProcessor(Processor p);
+    boolean process(ContentEvent event);
+    void onCreate(int id);
+    Processor newProcessor(Processor p);
 }
-</code>
-### Methods</p>
+</code></pre></div>
+<h3 id="methods">Methods</h3>
+
+<h4 id="1-boolean-process-contentevent-event">1. <code>boolean 
process(ContentEvent event)</code></h4>
+
+<p>Users should implement the three methods shown above. 
<code>process(ContentEvent event)</code> is the method in which all processing 
logic should be implemented. <code>ContentEvent</code> is a type (interface) 
which contains the event. This method will be called each time a new event is 
received. It should return <code>true</code> if the event has been correctly 
processed, <code>false</code> otherwise.</p>
 
-<h4 id="boolean-processcontentevent-event">1. <code 
class="highlighter-rouge">boolean process(ContentEvent event)</code></h4>
-<p>Users should implement the three methods shown above. <code 
class="highlighter-rouge">process(ContentEvent event)</code> is the method in 
which all processing logic should be implemented. <code 
class="highlighter-rouge">ContentEvent</code> is a type (interface) which 
contains the event. This method will be called each time a new event is 
received. It should return <code class="highlighter-rouge">true</code> if the 
event has been correctly processed, <code 
class="highlighter-rouge">false</code> otherwise.</p>
+<h4 id="2-void-oncreate-int-id">2. <code>void onCreate(int id)</code></h4>
 
-<h4 id="void-oncreateint-id">2. <code class="highlighter-rouge">void 
onCreate(int id)</code></h4>
-<p>is the method in which all initialization code should be written. Multiple 
copies/instances of the Processor are created based on the parallelism hint 
specified by the user. SAMOA assigns each instance a unique id which is passed 
as a parameter <code class="highlighter-rouge">id</code> to <code 
class="highlighter-rouge">onCreate(int it)</code> method of each instance.</p>
+<p>is the method in which all initialization code should be written. Multiple 
copies/instances of the Processor are created based on the parallelism hint 
specified by the user. SAMOA assigns each instance a unique id which is passed 
as a parameter <code>id</code> to <code>onCreate(int it)</code> method of each 
instance.</p>
 
-<h4 id="processor-newprocessorprocessor-p">3. <code 
class="highlighter-rouge">Processor newProcessor(Processor p)</code></h4>
-<p>is very simple to implement. This method is just a technical overhead that 
has no logical use except that it helps SAMOA in some of its internals. Users 
should just return a new copy of the instance of this class which implements 
this Processor interface.</p>
+<h4 id="3-processor-newprocessor-processor-p">3. <code>Processor 
newProcessor(Processor p)</code></h4>
+
+<p>is very simple to implement. This method is just a technical overhead that 
has no logical use except that it helps SAMOA in some of its internals. Users 
should just return a new copy of the instance of this class which implements 
this Processor interface. </p>
 
 <h3 id="preview-of-entranceprocessor">Preview of EntranceProcessor</h3>
-<p>```
-package org.apache.samoa.core;</p>
+<div class="highlight"><pre><code class="language-" data-lang="">package 
org.apache.samoa.core;
 
-<p>public interface EntranceProcessor extends Processor {
+public interface EntranceProcessor extends Processor {
     public boolean isFinished();
     public boolean hasNext();
     public ContentEvent nextEvent();
 }
-```
-### Methods</p>
+</code></pre></div>
+<h3 id="methods">Methods</h3>
+
+<h4 id="1-boolean-isfinished">1. <code>boolean isFinished()</code></h4>
+
+<p>returns whether to expect more events coming from the entrance processor. 
If the source is a live stream this method should return always 
<code>false</code>. If the source is a file, the method should return 
<code>true</code> once the file has been fully processed.</p>
 
-<h4 id="boolean-isfinished">1. <code class="highlighter-rouge">boolean 
isFinished()</code></h4>
-<p>returns whether to expect more events coming from the entrance processor. 
If the source is a live stream this method should return always <code 
class="highlighter-rouge">false</code>. If the source is a file, the method 
should return <code class="highlighter-rouge">true</code> once the file has 
been fully processed.</p>
+<h4 id="2-boolean-hasnext">2. <code>boolean hasNext()</code></h4>
 
-<h4 id="boolean-hasnext">2. <code class="highlighter-rouge">boolean 
hasNext()</code></h4>
-<p>returns whether the next event is ready for consumption. If the method 
returns <code class="highlighter-rouge">true</code> a subsequent call to <code 
class="highlighter-rouge">nextEvent</code> should yield the next event to be 
processed. If the method returns <code class="highlighter-rouge">false</code> 
the engine can use this information to avoid continuously polling the entrance 
processor.</p>
+<p>returns whether the next event is ready for consumption. If the method 
returns <code>true</code> a subsequent call to <code>nextEvent</code> should 
yield the next event to be processed. If the method returns <code>false</code> 
the engine can use this information to avoid continuously polling the entrance 
processor.</p>
 
-<h4 id="contentevent-nextevent">3. <code 
class="highlighter-rouge">ContentEvent nextEvent()</code></h4>
-<p>is the main method for the entrance processor as it returns the next event 
to be processed by the topology. It should be called only if <code 
class="highlighter-rouge">isFinished()</code> returned <code 
class="highlighter-rouge">false</code> and <code 
class="highlighter-rouge">hasNext()</code> returned <code 
class="highlighter-rouge">true</code>.</p>
+<h4 id="3-contentevent-nextevent">3. <code>ContentEvent nextEvent()</code></h4>
+
+<p>is the main method for the entrance processor as it returns the next event 
to be processed by the topology. It should be called only if 
<code>isFinished()</code> returned <code>false</code> and 
<code>hasNext()</code> returned <code>true</code>.</p>
 
 <h3 id="note">Note</h3>
-<p>All state variables of the class implementing this interface must be 
serializable. It can be done by implementing the <code 
class="highlighter-rouge">Serializable</code> interface. The simple way to skip 
this requirement is to declare those variables as <code 
class="highlighter-rouge">transient</code> and initialize them in the <code 
class="highlighter-rouge">onCreate()</code> method. Remember, all 
initializations of such transient variables done in the constructor will be 
lost.</p>
+
+<p>All state variables of the class implementing this interface must be 
serializable. It can be done by implementing the <code>Serializable</code> 
interface. The simple way to skip this requirement is to declare those 
variables as <code>transient</code> and initialize them in the 
<code>onCreate()</code> method. Remember, all initializations of such transient 
variables done in the constructor will be lost.</p>
 
   </article>
 

Modified: incubator/samoa/site/documentation/SAMOA-for-MOA-users.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/SAMOA-for-MOA-users.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/SAMOA-for-MOA-users.html (original)
+++ incubator/samoa/site/documentation/SAMOA-for-MOA-users.html Sun Sep 25 
20:39:59 2016
@@ -73,23 +73,23 @@
   </header>
 
   <article class="post-content">
-    <p>If youâre an advanced user of <a 
href="http://moa.cms.waikato.ac.nz/";>MOA</a>, youâll find easy to run SAMOA. 
You need to note the following:</p>
+    <p>If you&#39;re an advanced user of <a 
href="http://moa.cms.waikato.ac.nz/";>MOA</a>, you&#39;ll find easy to run 
SAMOA. You need to note the following:</p>
 
 <ul>
-  <li>There is no GUI interface in SAMOA</li>
-  <li>You can run SAMOA in the following modes:
-    <ol>
-      <li>Simulation Environment. Use <code 
class="highlighter-rouge">org.apache.samoa.DoTask</code> instead of <code 
class="highlighter-rouge">moa.DoTask</code></li>
-      <li>Storm Local Mode. Use <code 
class="highlighter-rouge">org.apache.samoa.LocalStormDoTask</code> instead of 
<code class="highlighter-rouge">moa.DoTask</code></li>
-      <li>Storm Cluster Mode. You need to use the <code 
class="highlighter-rouge">samoa</code> script as it is explained in <a 
href="Executing SAMOA with Apache Storm">Executing SAMOA with Apache 
Storm</a>.</li>
-      <li>S4. You need to use the <code class="highlighter-rouge">samoa</code> 
script as it is explained in <a href="Executing SAMOA with Apache S4">Executing 
SAMOA with Apache S4</a></li>
-    </ol>
-  </li>
+<li>There is no GUI interface in SAMOA</li>
+<li>You can run SAMOA in the following modes:
+
+<ol>
+<li>Simulation Environment. Use <code>org.apache.samoa.DoTask</code> instead 
of <code>moa.DoTask</code><br></li>
+<li>Storm Local Mode. Use <code>org.apache.samoa.LocalStormDoTask</code> 
instead of <code>moa.DoTask</code></li>
+<li>Storm Cluster Mode. You need to use the <code>samoa</code> script as it is 
explained in <a href="Executing%20SAMOA%20with%20Apache%20Storm">Executing 
SAMOA with Apache Storm</a>.</li>
+<li>S4. You need to use the <code>samoa</code> script as it is explained in <a 
href="Executing%20SAMOA%20with%20Apache%20S4">Executing SAMOA with Apache 
S4</a></li>
+</ol></li>
 </ul>
 
-<p>To start with SAMOA, you can start with a simple example using the 
CoverType dataset as it is discussed in <a href="Getting Started">Getting 
Started</a>.</p>
+<p>To start with SAMOA, you can start with a simple example using the 
CoverType dataset as it is discussed in <a href="Getting%20Started">Getting 
Started</a>.  </p>
 
-<p>To use MOA algorithms inside SAMOA, take a look at <a 
href="https://github.com/samoa-moa/samoa-moa";>https://github.com/samoa-moa/samoa-moa</a>.</p>
+<p>To use MOA algorithms inside SAMOA, take a look at <a 
href="https://github.com/samoa-moa/samoa-moa";>https://github.com/samoa-moa/samoa-moa</a>.
 </p>
 
   </article>
 

Modified: 
incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- 
incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html
 (original)
+++ 
incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html
 Sun Sep 25 20:39:59 2016
@@ -83,6 +83,7 @@
   <li><a href="Executing-SAMOA-with-Apache-S4.html">Executing SAMOA with 
Apache S4</a></li>
   <li><a href="Executing-SAMOA-with-Apache-Samza.html">Executing SAMOA with 
Apache Samza</a></li>
   <li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">Executing SAMOA 
with Apache Avro Files</a></li>
+<li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">Executing SAMOA with 
Apache Avro Files</a></li>
 </ul>
 
   </article>

Modified: incubator/samoa/site/documentation/Stream.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Stream.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Stream.html (original)
+++ incubator/samoa/site/documentation/Stream.html Sun Sep 25 20:39:59 2016
@@ -73,51 +73,47 @@
   </header>
 
   <article class="post-content">
-    <p>A stream is a physical unit of SAMOA topology which connects different 
Processors with each other. Stream is also created by a <code 
class="highlighter-rouge">TopologyBuilder</code> just like a Processor. A 
stream can have a single source but many destinations. A Processor which is the 
source of a stream, owns the stream.</p>
+    <p>A stream is a physical unit of SAMOA topology which connects different 
Processors with each other. Stream is also created by a 
<code>TopologyBuilder</code> just like a Processor. A stream can have a single 
source but many destinations. A Processor which is the source of a stream, owns 
the stream.</p>
 
-<h3 id="creating-a-stream">1. Creating a Stream</h3>
-<p>The following code snippet shows how a Stream is created:</p>
+<h3 id="1-creating-a-stream">1. Creating a Stream</h3>
 
-<p><code class="highlighter-rouge">
-builder.initTopology("MyTopology");
+<p>The following code snippet shows how a Stream is created:</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">builder.initTopology("MyTopology");
 Processor sourceProcessor = new Sampler();
 builder.addProcessor(samplerProcessor, 3);
 Stream sourceDataStream = builder.createStream(sourceProcessor);
-</code></p>
+</code></pre></div>
+<h3 id="2-connecting-a-stream">2. Connecting a Stream</h3>
 
-<h3 id="connecting-a-stream">2. Connecting a Stream</h3>
 <p>As described above, a Stream can have many destinations. In the following 
figure, a single stream from sourceProcessor is connected to three different 
destination Processors each having three instances.</p>
 
-<p><img src="images/SAMOA Message Shuffling.png" alt="SAMOA Message Shuffling" 
/></p>
+<p><img src="images/SAMOA%20Message%20Shuffling.png" alt="SAMOA Message 
Shuffling"></p>
+
+<p>SAMOA supports three different ways of distribution of messages to multiple 
instances of a Processor.</p>
 
-<p>SAMOA supports three different ways of distribution of messages to multiple 
instances of a Processor.
-####2.1 Shuffle
-In this way of message distribution, messages/events are distributed randomly 
among various instances of a Processor. 
+<h4 id="2-1-shuffle">2.1 Shuffle</h4>
+
+<p>In this way of message distribution, messages/events are distributed 
randomly among various instances of a Processor. 
 Following figure shows how the messages are distributed.
-<img src="images/SAMOA Explain Shuffling.png" alt="SAMOA Explain Shuffling" />
+<img src="images/SAMOA%20Explain%20Shuffling.png" alt="SAMOA Explain 
Shuffling">
 Following code snipped shows how to connect a stream to a destination using 
random shuffling.</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">builder.connectInputShuffleStream(sourceDataStream, 
destinationProcessor);
+</code></pre></div>
+<h4 id="2-2-key">2.2 Key</h4>
 
-<p><code class="highlighter-rouge">
-builder.connectInputShuffleStream(sourceDataStream, destinationProcessor);
-</code>
-####2.2 Key
-In this way of message distribution, messages with same key are sent to same 
instance of a Processor.
+<p>In this way of message distribution, messages with same key are sent to 
same instance of a Processor.
 Following figure illustrates key-based distribution.
-<img src="images/SAMOA Explain Key Shuffling.png" alt="SAMOA Explain Key 
Shuffling" />
+<img src="images/SAMOA%20Explain%20Key%20Shuffling.png" alt="SAMOA Explain Key 
Shuffling">
 Following code snippet shows how to connect a stream to a destination using 
key-based distribution.</p>
+<div class="highlight"><pre><code class="language-" 
data-lang="">builder.connectInputKeyStream(sourceDataStream, 
destinationProcessor);
+</code></pre></div>
+<h4 id="2-3-all">2.3 All</h4>
 
-<p><code class="highlighter-rouge">
-builder.connectInputKeyStream(sourceDataStream, destinationProcessor);
-</code>
-####2.3 All
-In this way of message distribution, all messages of a stream are sent to all 
instances of a destination Processor. Following figure illustrates this 
distribution process.
-<img src="images/SAMOA Explain All Shuffling.png" alt="SAMOA Explain All 
Shuffling" />
+<p>In this way of message distribution, all messages of a stream are sent to 
all instances of a destination Processor. Following figure illustrates this 
distribution process.
+<img src="images/SAMOA%20Explain%20All%20Shuffling.png" alt="SAMOA Explain All 
Shuffling">
 Following code snippet shows how to connect a stream to a destination using 
All-based distribution.</p>
-
-<p><code class="highlighter-rouge">
-builder.connectInputAllStream(sourceDataStream, destinationProcessor);
-</code></p>
-
+<div class="highlight"><pre><code class="language-" 
data-lang="">builder.connectInputAllStream(sourceDataStream, 
destinationProcessor);
+</code></pre></div>
   </article>
 
 <!-- </div> -->

Modified: incubator/samoa/site/documentation/Task.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Task.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Task.html (original)
+++ incubator/samoa/site/documentation/Task.html Sun Sep 25 20:39:59 2016
@@ -73,55 +73,56 @@
   </header>
 
   <article class="post-content">
-    <p>Task is similar to a job in Hadoop. Task is an execution entity. A 
topology must be defined inside a task. SAMOA can only execute classes that 
implement <code class="highlighter-rouge">Task</code> interface.</p>
+    <p>Task is similar to a job in Hadoop. Task is an execution entity. A 
topology must be defined inside a task. SAMOA can only execute classes that 
implement <code>Task</code> interface.</p>
 
-<h3 id="implementation">1. Implementation</h3>
-<p>```
-package org.apache.samoa.tasks;</p>
+<h3 id="1-implementation">1. Implementation</h3>
+<div class="highlight"><pre><code class="language-" data-lang="">package 
org.apache.samoa.tasks;
 
-<p>import org.apache.samoa.topology.ComponentFactory;
-import org.apache.samoa.topology.Topology;</p>
+import org.apache.samoa.topology.ComponentFactory;
+import org.apache.samoa.topology.Topology;
 
-<p>/**
+/**
  * Task interface, the mother of all SAMOA tasks!
  */
-public interface Task {</p>
+public interface Task {
 
-<div class="highlighter-rouge"><pre class="highlight"><code>/**
- * Initialize this SAMOA task, 
- * i.e. create and connect Processors and Streams
- * and initialize the topology
- */
-public void init();    
+    /**
+     * Initialize this SAMOA task, 
+     * i.e. create and connect Processors and Streams
+     * and initialize the topology
+     */
+    public void init(); 
+
+    /**
+     * Return the final topology object to be executed in the cluster
+     * @return topology object to be submitted to be executed in the cluster
+     */
+    public Topology getTopology();
+
+    /**
+     * Sets the factory.
+     * TODO: propose to hide factory from task, 
+     * i.e. Task will only see TopologyBuilder, 
+     * and factory creation will be handled by TopologyBuilder
+     *
+     * @param factory the new factory
+     */
+    public void setFactory(ComponentFactory factory) ;
+}
+</code></pre></div>
+<h3 id="2-methods">2. Methods</h3>
+
+<h5 id="2-1-void-init">2.1 <code>void init()</code></h5>
+
+<p>This method should build the desired topology by creating Processors and 
Streams and connecting them to each other.</p>
 
-/**
- * Return the final topology object to be executed in the cluster
- * @return topology object to be submitted to be executed in the cluster
- */
-public Topology getTopology();
-
-/**
- * Sets the factory.
- * TODO: propose to hide factory from task, 
- * i.e. Task will only see TopologyBuilder, 
- * and factory creation will be handled by TopologyBuilder
- *
- * @param factory the new factory
- */
-public void setFactory(ComponentFactory factory) ; } ```
-</code></pre>
-</div>
-
-<h3 id="methods">2. Methods</h3>
-<p>#####2.1 <code class="highlighter-rouge">void init()</code>
-This method should build the desired topology by creating Processors and 
Streams and connecting them to each other.</p>
+<h5 id="2-2-topology-gettopology">2.2 <code>Topology getTopology()</code></h5>
 
-<h5 id="topology-gettopology">2.2 <code class="highlighter-rouge">Topology 
getTopology()</code></h5>
-<p>This method should return the topology built by <code 
class="highlighter-rouge">init</code> to the engine for execution.</p>
+<p>This method should return the topology built by <code>init</code> to the 
engine for execution.</p>
 
-<h5 id="void-setfactorycomponentfactory-factory">2.3 <code 
class="highlighter-rouge">void setFactory(ComponentFactory factory)</code></h5>
-<p>Utility method to accept a <code 
class="highlighter-rouge">ComponentFactory</code> to use in building the 
topology.</p>
+<h5 id="2-3-void-setfactory-componentfactory-factory">2.3 <code>void 
setFactory(ComponentFactory factory)</code></h5>
 
+<p>Utility method to accept a <code>ComponentFactory</code> to use in building 
the topology.</p>
 
   </article>
 

Modified: incubator/samoa/site/documentation/Team.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Team.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Team.html (original)
+++ incubator/samoa/site/documentation/Team.html Sun Sep 25 20:39:59 2016
@@ -76,51 +76,52 @@
     <h2 id="team">Team</h2>
 
 <table class="table table-striped">
-       <thead>
-               <th class="text-center"></th>
-               <th class="text-center">Name</th>
-               <th class="text-center">Role</th>
-               <th class="text-center">Apache ID</th>
-       </thead>
-       <tr>
-               <td class="text-center"></td>
-               <td class="text-center"><a href="http://gdfm.me/";>Gianmarco De 
Francisci Morales</a></td>
-               <td class="text-center">PPMC</td>
-               <td class="text-center">gdfm</td>
-       </tr>
-       <tr>
-               <td class="text-center"></td>
-               <td class="text-center"><a 
href="http://www.albertbifet.com";>Albert Bifet</a></td>
-               <td class="text-center">PPMC</td>
-               <td class="text-center">abifet</td>
-       </tr>   
-       <tr>
-               <td class="text-center"></td>
-               <td class="text-center">Nicolas Kourtellis</td>
-               <td class="text-center">PPMC</td>
-               <td class="text-center">nkourtellis</td>
-       </tr>
-       <tr>
-               <td class="text-center"></td>
-               <td class="text-center"><a href="http://www.otnira.com";>Arinto 
Murdopo</a></td>
-               <td class="text-center">PPMC</td>
-               <td class="text-center">arinto</td>
-       </tr>
-       <tr>
-               <td class="text-center"></td>
-               <td class="text-center">Matthieu Morel</td>
-               <td class="text-center">PPMC</td>
-               <td class="text-center">mmorel</td>
-       </tr>
-       <tr>
-               <td class="text-center"></td>
-               <td class="text-center"><a 
href="http://www.van-laere.net";>Olivier Van Laere</a></td>
-               <td class="text-center">PPMC</td>
-               <td class="text-center">ovlaere</td>
-       </tr>
+    <thead>
+        <th class="text-center"></th>
+        <th class="text-center">Name</th>
+        <th class="text-center">Role</th>
+        <th class="text-center">Apache ID</th>
+    </thead>
+    <tr>
+        <td class="text-center"></td>
+        <td class="text-center"><a href="http://gdfm.me/";>Gianmarco De 
Francisci Morales</a></td>
+        <td class="text-center">PPMC</td>
+        <td class="text-center">gdfm</td>
+    </tr>
+    <tr>
+        <td class="text-center"></td>
+        <td class="text-center"><a href="http://www.albertbifet.com";>Albert 
Bifet</a></td>
+        <td class="text-center">PPMC</td>
+        <td class="text-center">abifet</td>
+    </tr>   
+    <tr>
+        <td class="text-center"></td>
+        <td class="text-center">Nicolas Kourtellis</td>
+        <td class="text-center">PPMC</td>
+        <td class="text-center">nkourtellis</td>
+    </tr>
+    <tr>
+        <td class="text-center"></td>
+        <td class="text-center"><a href="http://www.otnira.com";>Arinto 
Murdopo</a></td>
+        <td class="text-center">PPMC</td>
+        <td class="text-center">arinto</td>
+    </tr>
+    <tr>
+        <td class="text-center"></td>
+        <td class="text-center">Matthieu Morel</td>
+        <td class="text-center">PPMC</td>
+        <td class="text-center">mmorel</td>
+    </tr>
+    <tr>
+        <td class="text-center"></td>
+        <td class="text-center"><a href="http://www.van-laere.net";>Olivier Van 
Laere</a></td>
+        <td class="text-center">PPMC</td>
+        <td class="text-center">ovlaere</td>
+    </tr>
 </table>
 
 <h3 id="contributors">Contributors</h3>
+
 <ul>
 <li><a href="http://www.lsi.upc.edu/~marias/";>Marta Arias</a></li>
 <li>Foteini Beligianni</li>

svn commit: r1762231 [2/3] - in /incubator/samoa/site: ./ documentation/

Reply via email to