svn commit: r1737551 [2/3] - /incubator/samoa/site/documentation/

gdfm Sun, 03 Apr 2016 01:18:22 -0700

Modified: 
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html?rev=1737551&r1=1737550&r2=1737551&view=diff
==============================================================================
--- 
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html 
(original)
+++ 
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html 
Sun Apr  3 08:17:59 2016
@@ -76,15 +76,15 @@
     <p>In this tutorial page we describe how to execute SAMOA with data files 
in Apache Avro file format. Here is an outline of this tutorial</p>
 
 <ol>
-<li>Overview of Apache Avro</li>
-<li>Avro Input Format for SAMOA</li>
-<li>SAMOA task execution with Avro</li>
-<li>Sample Avro Data for SAMOA</li>
+  <li>Overview of Apache Avro</li>
+  <li>Avro Input Format for SAMOA</li>
+  <li>SAMOA task execution with Avro</li>
+  <li>Sample Avro Data for SAMOA</li>
 </ol>
 
 <h3 id="overview-of-apache-avro">Overview of Apache Avro</h3>
 
-<p>Users of Apache SAMOA can now use Binary/JSON encoded Avro data as an 
alternate to the default ARFF file format as the data source. Avro is a remote 
procedure call and data serialization framework developed within Apache&#39;s 
Hadoop project. It uses JSON for defining data types and protocols, and 
serializes data in a compact binary format. Avro specifies two serialization 
encodings for the data: Binary and JSON, default being Binary. However the 
meta-data is always in JSON. Avro data is always serialized with its schema. 
Files that store Avro data should also include the schema for that data in the 
same file. </p>
+<p>Users of Apache SAMOA can now use Binary/JSON encoded Avro data as an 
alternate to the default ARFF file format as the data source. Avro is a remote 
procedure call and data serialization framework developed within Apacheâs 
Hadoop project. It uses JSON for defining data types and protocols, and 
serializes data in a compact binary format. Avro specifies two serialization 
encodings for the data: Binary and JSON, default being Binary. However the 
meta-data is always in JSON. Avro data is always serialized with its schema. 
Files that store Avro data should also include the schema for that data in the 
same file.</p>
 
 <p>You can find the latest Apache Avro documentation <a 
href="https://avro.apache.org/docs/current/";>here</a> for more details.</p>
 
@@ -93,42 +93,53 @@
 <p>It is required that the input Avro files to the SAMOA framework follow 
certain Input Format Rules to seamlessly work with the SAMOA Instances. The 
first line of Avro Source file for SAMOA (irrespective of whether data is 
encoded in binary or JSON) will be the metadata (schema). The data would be by 
default one record per line following the schema and will be mapped into 1 
SAMOA instance per record.</p>
 
 <ol>
-<li>Avro Primitive Types &amp; Enums are allowed for the data as is. </li>
-<li>Avro Complex-types (e.g maps/arrays) may not be used with the exception of 
enum &amp; union. I.e. no sub-structure will be allowed.</li>
-<li>Label (if any) would be the last attribute.</li>
-<li>Timestamps are not supported as of now within SAMOA.</li>
-<li>Avro Enums may be used to represent nominal attributes.</li>
-<li>Avro unions may be used to represent nullability of value. However unions 
may not be used for different data types.<br></li>
+  <li>Avro Primitive Types &amp; Enums are allowed for the data as is.</li>
+  <li>Avro Complex-types (e.g maps/arrays) may not be used with the exception 
of enum &amp; union. I.e. no sub-structure will be allowed.</li>
+  <li>Label (if any) would be the last attribute.</li>
+  <li>Timestamps are not supported as of now within SAMOA.</li>
+  <li>Avro Enums may be used to represent nominal attributes.</li>
+  <li>Avro unions may be used to represent nullability of value. However 
unions may not be used for different data types.</li>
 </ol>
-<div class="highlight"><pre><code class="language-" data-lang="">E.g  Enums  
+
+<p><code class="highlighter-rouge">
+E.g  Enums  
 
{"name":"species","type":{"type":"enum","name":"Labels","symbols":["setosa","versicolor","virginica"]}}
  
 E.g  Unions  
 {"name":"attribute1","type":["null","int"]}  -Allowed to denote that value for 
attribute1 is optional  
 {"name":" attribute2","type":["string","int"]}  -Not allowed  
-</code></pre></div>
+</code></p>
+
 <h3 id="samoa-task-execution-with-avro">SAMOA task execution with Avro</h3>
 
-<p>You may execute a SAMOA task using the aforementioned 
<code>bin/samoa</code> script with the following format: <code>bin/samoa 
&lt;platform&gt; &lt;jar&gt; &quot;&lt;task&gt;&quot;</code>.
-Follow this <a href="Executing-SAMOA-with-Apache-S4">link</a>  and this <a 
href="Executing-SAMOA-with-Apache-Storm">link</a> to learn more about deploying 
SAMOA on Apache S4 and Apache Storm respectively. The Avro files can be used as 
data sources for any of the aforementioned platforms. The only addition that 
needs to be made in the commands is as follows:  <code>AvroFileStream 
&lt;file_name&gt; -e &lt;file_format&gt;</code> . Examples are given below for 
different modes. Though the examples below use <a 
href="Prequential-Evaluation-Task">Prequential Evaluation task</a> the commands 
are applicable to all other tasks as well.</p>
+<p>You may execute a SAMOA task using the aforementioned <code 
class="highlighter-rouge">bin/samoa</code> script with the following format: 
<code class="highlighter-rouge">bin/samoa &lt;platform&gt; &lt;jar&gt; 
"&lt;task&gt;"</code>.
+Follow this <a href="Executing-SAMOA-with-Apache-S4">link</a>  and this <a 
href="Executing-SAMOA-with-Apache-Storm">link</a> to learn more about deploying 
SAMOA on Apache S4 and Apache Storm respectively. The Avro files can be used as 
data sources for any of the aforementioned platforms. The only addition that 
needs to be made in the commands is as follows:  <code 
class="highlighter-rouge">AvroFileStream &lt;file_name&gt; -e 
&lt;file_format&gt;</code> . Examples are given below for different modes. 
Though the examples below use <a href="Prequential-Evaluation-Task">Prequential 
Evaluation task</a> the commands are applicable to all other tasks as well.</p>
+
+<h4 id="local---avro-json">Local - Avro JSON</h4>
+<p><code class="highlighter-rouge">
+bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar 
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f 
covtypeNorm_json.avro -e json) -f 100000"
+</code></p>
+
+<h4 id="local---avro-binary">Local - Avro Binary</h4>
+<p><code class="highlighter-rouge">
+bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar 
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f 
covtypeNorm_binary.avro -e binary) -f 100000"
+</code>
+#### Storm - Avro JSON
+<code class="highlighter-rouge">
+bin/samoa storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar 
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f 
covtypeNorm_json.avro -e json) -f 100000"
+</code>
+#### Storm - Avro Binary
+<code class="highlighter-rouge">
+bin/samoa storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar 
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f 
covtypeNorm_binary.avro -e binary) -f 100000"
+</code></p>
 
-<h4 id="local-avro-json">Local - Avro JSON</h4>
-<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation 
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_json.avro -e 
json) -f 100000"
-</code></pre></div>
-<h4 id="local-avro-binary">Local - Avro Binary</h4>
-<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation 
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_binary.avro 
-e binary) -f 100000"
-</code></pre></div>
-<h4 id="storm-avro-json">Storm - Avro JSON</h4>
-<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation 
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_json.avro -e 
json) -f 100000"
-</code></pre></div>
-<h4 id="storm-avro-binary">Storm - Avro Binary</h4>
-<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation 
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_binary.avro 
-e binary) -f 100000"
-</code></pre></div>
 <h3 id="sample-avro-data-for-samoa">Sample Avro Data for SAMOA</h3>
 
 <p>The samples below describe how the default ARFF file formats may be 
converted to JSON/Binary encoded Avro formats.</p>
 
-<h4 id="iris-dataset-default-arff-format">Iris Dataset - Default ARFF 
Format</h4>
-<div class="highlight"><pre><code class="language-" data-lang="">@RELATION 
iris  
+<h4 id="iris-dataset---default-arff-format">Iris Dataset - Default ARFF 
Format</h4>
+
+<p><code class="highlighter-rouge">
+@RELATION iris  
 @ATTRIBUTE sepallength  NUMERIC  
 @ATTRIBUTE sepalwidth   NUMERIC     
 @ATTRIBUTE petallength  NUMERIC     
@@ -139,20 +150,27 @@ Follow this <a href="Executing-SAMOA-wit
 4.9,3.0,1.4,0.2,virginica      
 4.7,3.2,1.3,0.2,virginica     
 4.6,3.1,1.5,0.2,setosa  
-</code></pre></div>
-<h4 id="iris-dataset-json-encoded-avro-format">Iris Dataset - JSON Encoded 
AVRO Format</h4>
-<div class="highlight"><pre><code class="language-" data-lang=""><span 
class="p">{</span><span class="nt">"type"</span><span class="p">:</span><span 
class="s2">"record"</span><span class="p">,</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"Iris"</span><span class="p">,</span><span 
class="nt">"namespace"</span><span class="p">:</span><span 
class="s2">"org.apache.samoa.avro.iris"</span><span class="p">,</span><span 
class="nt">"fields"</span><span class="p">:[{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"sepallength"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"sepalwidth"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span class="p
 ">:</span><span class="s2">"petallength"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"petalwidth"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"class"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:{</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"enum"</span><span class="p">,</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"Labels"</span><span class="p">,</span><span 
class="nt">"symbols"</span><span class="p">:[</span><span 
class="s2">"setosa"</span><span class="p">,</span><span 
class="s2">"versicolor"</span><span class="p">,</span><span 
class="s2">"virginica"</sp
 an><span class="p">]}}]}</span><span class="w">  
+</code></p>
+
+<h4 id="iris-dataset---json-encoded-avro-format">Iris Dataset - JSON Encoded 
AVRO Format</h4>
+
+<p><code class="highlighter-rouge"><span class="w">
+</span><span class="p">{</span><span class="nt">"type"</span><span 
class="p">:</span><span class="s2">"record"</span><span class="p">,</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"Iris"</span><span class="p">,</span><span 
class="nt">"namespace"</span><span class="p">:</span><span 
class="s2">"org.apache.samoa.avro.iris"</span><span class="p">,</span><span 
class="nt">"fields"</span><span class="p">:[{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"sepallength"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"sepalwidth"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"petallength"</span><span class
 ="p">,</span><span class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"petalwidth"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"class"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:{</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"enum"</span><span class="p">,</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"Labels"</span><span class="p">,</span><span 
class="nt">"symbols"</span><span class="p">:[</span><span 
class="s2">"setosa"</span><span class="p">,</span><span 
class="s2">"versicolor"</span><span class="p">,</span><span 
class="s2">"virginica"</span><span class="p">]}}]}</span><span class="w">  
 </span><span class="p">{</span><span class="nt">"sepallength"</span><span 
class="p">:</span><span class="mf">5.1</span><span class="p">,</span><span 
class="nt">"sepalwidth"</span><span class="p">:</span><span 
class="mf">3.5</span><span class="p">,</span><span 
class="nt">"petallength"</span><span class="p">:</span><span 
class="mf">1.4</span><span class="p">,</span><span 
class="nt">"petalwidth"</span><span class="p">:</span><span 
class="mf">0.2</span><span class="p">,</span><span 
class="nt">"class"</span><span class="p">:</span><span 
class="s2">"setosa"</span><span class="p">}</span><span class="w">  
 </span><span class="p">{</span><span class="nt">"sepallength"</span><span 
class="p">:</span><span class="mf">3.0</span><span class="p">,</span><span 
class="nt">"sepalwidth"</span><span class="p">:</span><span 
class="mf">1.4</span><span class="p">,</span><span 
class="nt">"petallength"</span><span class="p">:</span><span 
class="mf">4.9</span><span class="p">,</span><span 
class="nt">"petalwidth"</span><span class="p">:</span><span 
class="mf">0.2</span><span class="p">,</span><span 
class="nt">"class"</span><span class="p">:</span><span 
class="s2">"virginica"</span><span class="p">}</span><span class="w">  
 </span><span class="p">{</span><span class="nt">"sepallength"</span><span 
class="p">:</span><span class="mf">4.7</span><span class="p">,</span><span 
class="nt">"sepalwidth"</span><span class="p">:</span><span 
class="mf">3.2</span><span class="p">,</span><span 
class="nt">"petallength"</span><span class="p">:</span><span 
class="mf">1.3</span><span class="p">,</span><span 
class="nt">"petalwidth"</span><span class="p">:</span><span 
class="mf">0.2</span><span class="p">,</span><span 
class="nt">"class"</span><span class="p">:</span><span 
class="s2">"virginica"</span><span class="p">}</span><span class="w">  
 </span><span class="p">{</span><span class="nt">"sepallength"</span><span 
class="p">:</span><span class="mf">3.1</span><span class="p">,</span><span 
class="nt">"sepalwidth"</span><span class="p">:</span><span 
class="mf">1.5</span><span class="p">,</span><span 
class="nt">"petallength"</span><span class="p">:</span><span 
class="mf">4.6</span><span class="p">,</span><span 
class="nt">"petalwidth"</span><span class="p">:</span><span 
class="mf">0.2</span><span class="p">,</span><span 
class="nt">"class"</span><span class="p">:</span><span 
class="s2">"setosa"</span><span class="p">}</span><span class="w">  
-</span></code></pre></div>
-<h4 id="iris-dataset-binary-encoded-avro-format">Iris Dataset - Binary Encoded 
AVRO Format</h4>
-<div class="highlight"><pre><code class="language-" 
data-lang="">Objavro.schemaÎ
{"type":"record","name":"Iris","namespace":"org.apache.samoa.avro.iris","fields":[{"name":"sepallength","type":"double"},{"name":"sepalwidth","type":"double"},{"name":"petallength","type":"double"},{"name":"petalwidth","type":"double"},{"name":"class","type":{"type":"enum","name":"Labels","symbols":["setosa","versicolor","virginica"]}}]}
 !&lt;khCrÖ±Së¹§Þ©Èffffff@      @ffffffÙÙÉ¿       
@ffffffÙÙ@ÚÙÙÉ¿ÎÍÍ@ÚÙÙ  @ÎÍÍÙÙÉ¿ÎÍÍ@      ð¿¦¦ffff@ÚÙÙÉ¿ 
!&lt;khCrÖ±Së¹§Þ©
-</code></pre></div>
+</span></code></p>
+
+<h4 id="iris-dataset---binary-encoded-avro-format">Iris Dataset - Binary 
Encoded AVRO Format</h4>
+
+<p><code class="highlighter-rouge">
+Objavro.schemaÎ
{"type":"record","name":"Iris","namespace":"org.apache.samoa.avro.iris","fields":[{"name":"sepallength","type":"double"},{"name":"sepalwidth","type":"double"},{"name":"petallength","type":"double"},{"name":"petalwidth","type":"double"},{"name":"class","type":{"type":"enum","name":"Labels","symbols":["setosa","versicolor","virginica"]}}]}
 !&lt;khCrÖ±Së¹§Þ©Èffffff@      @ffffffÙÙÉ¿       
@ffffffÙÙ@ÚÙÙÉ¿ÎÍÍ@ÚÙÙ      @ÎÍÍÙÙÉ¿ÎÍÍ@      
ð¿¦¦ffff@ÚÙÙÉ¿ !&lt;khCrÖ±Së¹§Þ©
+</code></p>
+
 <h4 id="forest-covertype-dataset">Forest CoverType Dataset</h4>
+<p>The JSON &amp; Binary encoded AVRO Files covtypeNorm_json.avro &amp; 
covtypeNorm_binary.avro for the Forest CoverType dataset can be found at <a 
href="https://cwiki.apache.org/confluence/display/SAMOA/SAMOA+Home";>Wiki</a></p>
 
-<p>The JSON &amp; Binary encoded AVRO Files covtypeNorm_json.avro &amp; 
covtypeNorm_binary.avro for the Forest CoverType dataset can be found at <a 
href="https://cwiki.apache.org/confluence/display/SAMOA/SAMOA+Home";>Wiki</a> 
</p>
 
   </article>


Modified: incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html?rev=1737551&r1=1737550&r2=1737551&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html 
(original)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html Sun 
Apr  3 08:17:59 2016
@@ -76,101 +76,115 @@
     <p>In this tutorial page we describe how to execute SAMOA on top of Apache 
S4.</p>
 
 <h2 id="prerequisites">Prerequisites</h2>
-
 <p>The following dependencies are needed to run SAMOA smoothly on Apache S4</p>
 
 <ul>
-<li><a href="http://www.gradle.org/";>Gradle</a></li>
-<li><a href="https://incubator.apache.org/s4/";>Apache S4</a></li>
+  <li><a href="http://www.gradle.org/";>Gradle</a></li>
+  <li><a href="https://incubator.apache.org/s4/";>Apache S4</a></li>
 </ul>
 
 <h2 id="gradle">Gradle</h2>
-
 <p>Gradle is a build automation tool and is used to build Apache S4. The 
installation guide can be found <a 
href="http://www.gradle.org/docs/current/userguide/installation.html";>here.</a> 
The following instructions is a simplified installation guide.</p>
 
 <ol>
-<li>Download Gradle binaries from <a 
href="http://services.gradle.org/distributions/gradle-1.6-bin.zip";>downloads</a>,
 or from the console type <code>wget 
http://services.gradle.org/distributions/gradle-1.6-bin.zip</code></li>
-<li>Unzip the file <code>unzip gradle-1.6-bin.zip</code></li>
-<li>Set the Gradle environment variable: <code>export 
GRADLE_HOME=/foo/bar/gradle-1.6</code></li>
-<li>Add to the systems path <code>export 
PATH=$PATH:$GRADLE_HOME/bin</code></li>
-<li>Install Gradle by running <code>gradle</code></li>
+  <li>Download Gradle binaries from <a 
href="http://services.gradle.org/distributions/gradle-1.6-bin.zip";>downloads</a>,
 or from the console type <code class="highlighter-rouge">wget 
http://services.gradle.org/distributions/gradle-1.6-bin.zip</code></li>
+  <li>Unzip the file <code class="highlighter-rouge">unzip 
gradle-1.6-bin.zip</code></li>
+  <li>Set the Gradle environment variable: <code 
class="highlighter-rouge">export GRADLE_HOME=/foo/bar/gradle-1.6</code></li>
+  <li>Add to the systems path <code class="highlighter-rouge">export 
PATH=$PATH:$GRADLE_HOME/bin</code></li>
+  <li>Install Gradle by running <code 
class="highlighter-rouge">gradle</code></li>
 </ol>
 
 <p>Now you are all set to install Apache S4</p>
 
 <h2 id="apache-s4">Apache S4</h2>
-
 <p>S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable 
platform that allows programmers to easily develop applications for processing 
continuous unbounded streams of data. The installation process is as 
follows:</p>
 
 <ol>
-<li>Download the latest Apache S4 release from <a 
href="http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip";>Apache
 S4 0.6.0</a> or from command line <code>wget 
http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip</code>
 or clone from git.
-<code>git clone 
https://git-wip-us.apache.org/repos/asf/incubator-s4.git</code>.</li>
-<li>Unzip the file <code>unzip apache-s4-0.6.0-incubating-src.zip</code> or go 
in the cloned directory.</li>
-<li>Set the Apache S4 environment variable <code>export 
S4_HOME=/foo/bar/apache-s4-0.6.0-incubating-src</code>.</li>
-<li>Add the S4_HOME to the system PATH. <code>export 
PATH=$PATH:$S4_HOME</code>.</li>
-<li>Once the previous steps are done we can proceed to build and install 
Apache S4.</li>
-<li>You can have a look at the available build tasks by typing <code>gradle 
tasks</code>.</li>
-<li>There are some dependencies issues, therefore you should run the wrapper 
task first by typing <code>gradle wrapper</code>.</li>
-<li>Install the artifacts for Apache S4 by running <code>gradle install</code> 
in the S4_HOME directory.</li>
-<li>Install the S4-TOOLS, <code>gradle s4-tools::installApp</code>.</li>
+  <li>Download the latest Apache S4 release from <a 
href="http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip";>Apache
 S4 0.6.0</a> or from command line <code class="highlighter-rouge">wget 
http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip</code>
 or clone from git.
+<code class="highlighter-rouge">git clone 
https://git-wip-us.apache.org/repos/asf/incubator-s4.git</code>.</li>
+  <li>Unzip the file <code class="highlighter-rouge">unzip 
apache-s4-0.6.0-incubating-src.zip</code> or go in the cloned directory.</li>
+  <li>Set the Apache S4 environment variable <code 
class="highlighter-rouge">export 
S4_HOME=/foo/bar/apache-s4-0.6.0-incubating-src</code>.</li>
+  <li>Add the S4_HOME to the system PATH. <code 
class="highlighter-rouge">export PATH=$PATH:$S4_HOME</code>.</li>
+  <li>Once the previous steps are done we can proceed to build and install 
Apache S4.</li>
+  <li>You can have a look at the available build tasks by typing <code 
class="highlighter-rouge">gradle tasks</code>.</li>
+  <li>There are some dependencies issues, therefore you should run the wrapper 
task first by typing <code class="highlighter-rouge">gradle wrapper</code>.</li>
+  <li>Install the artifacts for Apache S4 by running <code 
class="highlighter-rouge">gradle install</code> in the S4_HOME directory.</li>
+  <li>Install the S4-TOOLS, <code class="highlighter-rouge">gradle 
s4-tools::installApp</code>.</li>
 </ol>
 
 <p>Done. Now you can configure and run your Apache S4 cluster.</p>
 
-<hr>
+<hr />
 
 <h2 id="building-samoa">Building SAMOA</h2>
-
 <p>Once the S4 dependencies are installed, you can simply clone the repository 
and install SAMOA.</p>
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">git 
clone http://git.apache.org/incubator-samoa.git
-<span class="nb">cd </span>incubator-samoa
+
+<p><code class="highlighter-rouge">bash
+git clone http://git.apache.org/incubator-samoa.git
+cd incubator-samoa
 mvn -Ps4 package 
-</code></pre></div>
-<p>The deployable jars for SAMOA will be in 
<code>target/SAMOA-&lt;variant&gt;-&lt;version&gt;-SNAPSHOT.jar</code>. For 
example, in our case for S4 <code>target/SAMOA-S4-0.3.0-SNAPSHOT.jar</code>.</p>
+</code></p>
 
-<hr>
+<p>The deployable jars for SAMOA will be in <code 
class="highlighter-rouge">target/SAMOA-&lt;variant&gt;-&lt;version&gt;-SNAPSHOT.jar</code>.
 For example, in our case for S4 <code 
class="highlighter-rouge">target/SAMOA-S4-0.3.0-SNAPSHOT.jar</code>.</p>
 
-<h2 id="samoa-s4-configuration">SAMOA-S4 Configuration</h2>
+<hr />
 
-<p>This section will go through the <code>bin/samoa-s4.properties</code> file 
and how to configure it.
+<h2 id="samoa-s4-configuration">SAMOA-S4 Configuration</h2>
+<p>This section will go through the <code 
class="highlighter-rouge">bin/samoa-s4.properties</code> file and how to 
configure it.
 In order for SAMOA to run correctly in a distributed environment there are 
some variables that need to be defined. Since Apache S4 uses <a 
href="https://zookeeper.apache.org/";>ZooKeeper</a> for cluster management we 
need to define where it is running.</p>
-<div class="highlight"><pre><code class="language-" data-lang=""># Zookeeper 
Server
+
+<div class="highlighter-rouge"><pre class="highlight"><code># Zookeeper Server
 zookeeper.server=localhost
 zookeeper.port=2181
-</code></pre></div>
+</code></pre>
+</div>
+
 <p>Apache S4 also distributes the application via HTTP, therefore the server 
and port which contains the S4 application must be provided.</p>
-<div class="highlight"><pre><code class="language-" data-lang=""># Simple HTTP 
Server providing the packaged S4 jar
+
+<div class="highlighter-rouge"><pre class="highlight"><code># Simple HTTP 
Server providing the packaged S4 jar
 http.server.ip=localhost
 http.server.port=8000
-</code></pre></div>
+</code></pre>
+</div>
+
 <p>Apache S4 uses the concept of logical clusters to define a group of 
machines, which are identified by an ID and start serving on a specific 
port.</p>
-<div class="highlight"><pre><code class="language-" data-lang=""># Name of the 
S4 cluster
+
+<div class="highlighter-rouge"><pre class="highlight"><code># Name of the S4 
cluster
 cluster.name=cluster
 cluster.port=12000
-</code></pre></div>
-<p>SAMOA can be deployed on a single machine using only one resource or in a 
cluster environments. The following property can be defined to deploy as a 
<code>local</code> application or on a <code>cluster</code>.</p>
-<div class="highlight"><pre><code class="language-" data-lang=""># Deployment 
strategy
+</code></pre>
+</div>
+
+<p>SAMOA can be deployed on a single machine using only one resource or in a 
cluster environments. The following property can be defined to deploy as a 
<code class="highlighter-rouge">local</code> application or on a <code 
class="highlighter-rouge">cluster</code>.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code># Deployment 
strategy
 samoa.deploy.mode=local
-</code></pre></div>
-<hr>
+</code></pre>
+</div>
+
+<hr />
 
 <h2 id="samoa-s4-deployment">SAMOA S4 Deployment</h2>
 
-<p>In order to deploy SAMOA in a distributed environment you 
<strong>MUST</strong> configure the <code>bin/samoa-s4.properties</code> file 
correctly. If you are running locally it is optional to modify the properties 
file.</p>
+<p>In order to deploy SAMOA in a distributed environment you 
<strong>MUST</strong> configure the <code 
class="highlighter-rouge">bin/samoa-s4.properties</code> file correctly. If you 
are running locally it is optional to modify the properties file.</p>
 
-<p>The deployment is done by running the SAMOA execution script 
<code>bin/samoa</code> with some additional parameters.
+<p>The deployment is done by running the SAMOA execution script <code 
class="highlighter-rouge">bin/samoa</code> with some additional parameters.
 The execution syntax is as follows:
-<code>bin/samoa &lt;platform&gt; &lt;jar-location&gt; &lt;task &amp; 
options&gt;</code></p>
+<code class="highlighter-rouge">bin/samoa &lt;platform&gt; 
&lt;jar-location&gt; &lt;task &amp; options&gt;</code></p>
 
 <p>Example:</p>
-<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa S4 
target/SAMOA-S4-0.0.1-SNAPSHOT.jar "ClusteringEvaluation"
-</code></pre></div>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>bin/samoa S4 
target/SAMOA-S4-0.0.1-SNAPSHOT.jar "ClusteringEvaluation"
+</code></pre>
+</div>
+
 <p>The &lt;platform&gt; can be s4 or storm.</p>
 
 <p>The &lt;jar-location&gt; must be the absolute path to the platform specific 
jar file.</p>
 
 <p>The &lt;task &amp; options&gt; should be the name of a known task and the 
options belonging to that task.</p>
 
+
   </article>
 
 <!-- </div> -->

Modified: 
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html?rev=1737551&r1=1737550&r2=1737551&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html 
(original)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html 
Sun Apr  3 08:17:59 2016
@@ -77,221 +77,313 @@
 The steps included in this tutorial are:</p>
 
 <ol>
-<li><p>Setup and configure a cluster with the required dependencies. This 
applies for single-node (local) execution as well.</p></li>
-<li><p>Build SAMOA deployables</p></li>
-<li><p>Configure SAMOA-Samza</p></li>
-<li><p>Deploy SAMOA-Samza and execute a task</p></li>
-<li><p>Observe the execution and the result</p></li>
+  <li>
+    <p>Setup and configure a cluster with the required dependencies. This 
applies for single-node (local) execution as well.</p>
+  </li>
+  <li>
+    <p>Build SAMOA deployables</p>
+  </li>
+  <li>
+    <p>Configure SAMOA-Samza</p>
+  </li>
+  <li>
+    <p>Deploy SAMOA-Samza and execute a task</p>
+  </li>
+  <li>
+    <p>Observe the execution and the result</p>
+  </li>
 </ol>
 
 <h2 id="setup-cluster">Setup cluster</h2>
-
 <p>The following are needed to to run SAMOA on top of Samza:</p>
 
 <ul>
-<li><a href="http://zookeeper.apache.org/";>Apache Zookeeper</a></li>
-<li><a href="http://kafka.apache.org/";>Apache Kafka</a></li>
-<li><a 
href="http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html";>Apache
 Hadoop YARN and HDFS</a></li>
+  <li><a href="http://zookeeper.apache.org/";>Apache Zookeeper</a></li>
+  <li><a href="http://kafka.apache.org/";>Apache Kafka</a></li>
+  <li><a 
href="http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html";>Apache
 Hadoop YARN and HDFS</a></li>
 </ul>
 
 <h3 id="zookeeper">Zookeeper</h3>
-
-<p>Zookeeper is used by Kafka to coordinate its brokers. The detail 
instructions to setup a Zookeeper cluster can be found <a 
href="http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html";>here</a>. 
</p>
+<p>Zookeeper is used by Kafka to coordinate its brokers. The detail 
instructions to setup a Zookeeper cluster can be found <a 
href="http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html";>here</a>.</p>
 
 <p>To quickly setup a single-node Zookeeper cluster:</p>
 
 <ol>
-<li><p>Download the binary release from the <a 
href="http://zookeeper.apache.org/releases.html";>release page</a>.</p></li>
-<li><p>Untar the archive</p></li>
+  <li>
+    <p>Download the binary release from the <a 
href="http://zookeeper.apache.org/releases.html";>release page</a>.</p>
+  </li>
+  <li>
+    <p>Untar the archive</p>
+  </li>
 </ol>
-<div class="highlight"><pre><code class="language-" data-lang="">tar -xf 
$DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/
-</code></pre></div>
+
+<p><code class="highlighter-rouge">
+tar -xf $DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/
+</code></p>
+
 <ol>
-<li>Copy the default configuration file</li>
+  <li>Copy the default configuration file</li>
 </ol>
-<div class="highlight"><pre><code class="language-" data-lang="">cp 
zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg
-</code></pre></div>
+
+<p><code class="highlighter-rouge">
+cp zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg
+</code></p>
+
 <ol>
-<li>Start the single-node cluster</li>
+  <li>Start the single-node cluster</li>
 </ol>
-<div class="highlight"><pre><code class="language-" 
data-lang="">~/zookeeper-3.4.6/bin/zkServer.sh start
-</code></pre></div>
-<h3 id="kafka">Kafka</h3>
 
-<p>Kafka is a distributed, partitioned, replicated commit log service which 
Samza uses as its default messaging system. </p>
+<p><code class="highlighter-rouge">
+~/zookeeper-3.4.6/bin/zkServer.sh start
+</code></p>
+
+<h3 id="kafka">Kafka</h3>
+<p>Kafka is a distributed, partitioned, replicated commit log service which 
Samza uses as its default messaging system.</p>
 
 <ol>
-<li><p>Download a binary release of Kafka <a 
href="http://kafka.apache.org/downloads.html";>here</a>. As mentioned in the 
page, the Scala version does not matter. However, 2.10 is recommended as Samza 
has recently been moved to Scala 2.10.</p></li>
-<li><p>Untar the archive </p></li>
+  <li>
+    <p>Download a binary release of Kafka <a 
href="http://kafka.apache.org/downloads.html";>here</a>. As mentioned in the 
page, the Scala version does not matter. However, 2.10 is recommended as Samza 
has recently been moved to Scala 2.10.</p>
+  </li>
+  <li>
+    <p>Untar the archive</p>
+  </li>
 </ol>
-<div class="highlight"><pre><code class="language-" data-lang="">tar -xzf 
$DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/
-</code></pre></div>
+
+<p><code class="highlighter-rouge">
+tar -xzf $DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/
+</code></p>
+
 <p>If you are running in local mode or a single-node cluster, you can now 
start Kafka with the command:</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">~/kafka_2.10-0.8.1/bin/kafka-server-start.sh 
kafka_2.10-0.8.1/config/server.properties
-</code></pre></div>
-<p>In multi-node cluster, it is typical and convenient to have a Kafka broker 
on each node (although you can totally have a smaller Kafka cluster, or even a 
single-node Kafka cluster). The number of brokers in Kafka cluster will affect 
disk bandwidth and space (the more brokers we have, the higher value we will 
get for the two). In each node, you need to set the following properties in 
<code>~/kafka_2.10-0.8.1/config/server.properties</code> before starting Kafka 
service.</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">broker.id=a-unique-number-for-each-node
+
+<p><code class="highlighter-rouge">
+~/kafka_2.10-0.8.1/bin/kafka-server-start.sh 
kafka_2.10-0.8.1/config/server.properties
+</code></p>
+
+<p>In multi-node cluster, it is typical and convenient to have a Kafka broker 
on each node (although you can totally have a smaller Kafka cluster, or even a 
single-node Kafka cluster). The number of brokers in Kafka cluster will affect 
disk bandwidth and space (the more brokers we have, the higher value we will 
get for the two). In each node, you need to set the following properties in 
<code 
class="highlighter-rouge">~/kafka_2.10-0.8.1/config/server.properties</code> 
before starting Kafka service.</p>
+
+<p><code class="highlighter-rouge">
+broker.id=a-unique-number-for-each-node
 zookeeper.connect=zookeeper-host0-url:2181[,zookeeper-host1-url:2181,...]
-</code></pre></div>
+</code></p>
+
 <p>You might want to change the retention hours or retention bytes of the logs 
to avoid the logs size from growing too big.</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">log.retention.hours=number-of-hours-to-keep-the-logs
+
+<p><code class="highlighter-rouge">
+log.retention.hours=number-of-hours-to-keep-the-logs
 log.retention.bytes=number-of-bytes-to-keep-in-the-logs
-</code></pre></div>
-<h3 id="hadoop-yarn-and-hdfs">Hadoop YARN and HDFS</h3>
+</code></p>
 
+<h3 id="hadoop-yarn-and-hdfs">Hadoop YARN and HDFS</h3>
 <blockquote>
-<p>Hadoop YARN and HDFS are <strong>not</strong> required to run SAMOA in 
Samza local mode. </p>
+  <p>Hadoop YARN and HDFS are <strong>not</strong> required to run SAMOA in 
Samza local mode.</p>
 </blockquote>
 
 <p>To set up a YARN cluster, first download a binary release of Hadoop <a 
href="http://www.apache.org/dyn/closer.cgi/hadoop/common/";>here</a> on each 
node in the cluster and untar the archive
-<code>tar -xf $DOWNLOAD_DIR/hadoop-2.2.0.tar.gz -C ~/</code>. We have tested 
SAMOA with Hadoop 2.2.0 but Hadoop 2.3.0 should work too.</p>
+<code class="highlighter-rouge">tar -xf $DOWNLOAD_DIR/hadoop-2.2.0.tar.gz -C 
~/</code>. We have tested SAMOA with Hadoop 2.2.0 but Hadoop 2.3.0 should work 
too.</p>
 
 <p><strong>HDFS</strong></p>
 
-<p>Set the following properties in 
<code>~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml</code> in all nodes.</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">&lt;configuration&gt;
-  &lt;property&gt;
-    &lt;name&gt;dfs.datanode.data.dir&lt;/name&gt;
-    &lt;value&gt;file:///home/username/hadoop-2.2.0/hdfs/datanode&lt;/value&gt;
-    &lt;description&gt;Comma separated list of paths on the local filesystem 
of a DataNode where it should store its blocks.&lt;/description&gt;
-  &lt;/property&gt;
-
-  &lt;property&gt;
-    &lt;name&gt;dfs.namenode.name.dir&lt;/name&gt;
-    &lt;value&gt;file:///home/username/hadoop-2.2.0/hdfs/namenode&lt;/value&gt;
-    &lt;description&gt;Path on the local filesystem where the NameNode stores 
the namespace and transaction logs persistently.&lt;/description&gt;
-  &lt;/property&gt;
-&lt;/configuration&gt;
-</code></pre></div>
-<p>Add this property in <code>~/hadoop-2.2.0/etc/hadoop/core-site.xml</code> 
in all nodes.</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">&lt;configuration&gt;
-  &lt;property&gt;
-    &lt;name&gt;fs.defaultFS&lt;/name&gt;
-    &lt;value&gt;hdfs://localhost:9000/&lt;/value&gt;
-    &lt;description&gt;NameNode URI&lt;/description&gt;
-  &lt;/property&gt;
-
-  &lt;property&gt;
-    &lt;name&gt;fs.hdfs.impl&lt;/name&gt;
-    &lt;value&gt;org.apache.hadoop.hdfs.DistributedFileSystem&lt;/value&gt;
-  &lt;/property&gt;
-&lt;/configuration&gt;
-</code></pre></div>
-<p>For a multi-node cluster, change the hostname (&quot;localhost&quot;) to 
the correct host name of your namenode server.</p>
+<p>Set the following properties in <code 
class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml</code> in all 
nodes.</p>
+
+<p>```</p>
+<configuration>
+  <property>
+    <name>dfs.datanode.data.dir</name>
+    <value>file:///home/username/hadoop-2.2.0/hdfs/datanode</value>
+    <description>Comma separated list of paths on the local filesystem of a 
DataNode where it should store its blocks.</description>
+  </property>
+ 
+  <property>
+    <name>dfs.namenode.name.dir</name>
+    <value>file:///home/username/hadoop-2.2.0/hdfs/namenode</value>
+    <description>Path on the local filesystem where the NameNode stores the 
namespace and transaction logs persistently.</description>
+  </property>
+</configuration>
+<p>```</p>
+
+<p>Add this property in <code 
class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop/core-site.xml</code> in all 
nodes.</p>
+
+<p>```</p>
+<configuration>
+  <property>
+    <name>fs.defaultFS</name>
+    <value>hdfs://localhost:9000/</value>
+    <description>NameNode URI</description>
+  </property>
+
+  <property>
+    <name>fs.hdfs.impl</name>
+    <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
+  </property>
+</configuration>
+<p>```
+For a multi-node cluster, change the hostname (âlocalhostâ) to the correct 
host name of your namenode server.</p>
 
 <p>Format HDFS directory (only perform this if you are running it for the very 
first time)</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">~/hadoop-2.2.0/bin/hdfs namenode -format
-</code></pre></div>
+
+<p><code class="highlighter-rouge">
+~/hadoop-2.2.0/bin/hdfs namenode -format
+</code></p>
+
 <p>Start namenode daemon on one of the node</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode
-</code></pre></div>
+
+<p><code class="highlighter-rouge">
+~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode
+</code></p>
+
 <p>Start datanode daemon on all nodes</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode
-</code></pre></div>
+
+<p><code class="highlighter-rouge">
+~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode
+</code></p>
+
 <p><strong>YARN</strong></p>
 
-<p>If you are running in multi-node cluster, set the resource manager hostname 
in <code>~/hadoop-2.2.0/etc/hadoop/yarn-site.xml</code> in all nodes as 
follow:</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">&lt;configuration&gt;
-  &lt;property&gt;
-    &lt;name&gt;yarn.resourcemanager.hostname&lt;/name&gt;
-    &lt;value&gt;resourcemanager-url&lt;/value&gt;
-    &lt;description&gt;The hostname of the RM.&lt;/description&gt;
-  &lt;/property&gt;
-&lt;/configuration&gt;
-</code></pre></div>
+<p>If you are running in multi-node cluster, set the resource manager hostname 
in <code 
class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop/yarn-site.xml</code> in all 
nodes as follow:</p>
+
+<p>```</p>
+<configuration>
+  <property>
+    <name>yarn.resourcemanager.hostname</name>
+    <value>resourcemanager-url</value>
+    <description>The hostname of the RM.</description>
+  </property>
+</configuration>
+<p>```</p>
+
 <p><strong>Other configurations</strong>
 Now we need to tell Samza where to find the configuration of YARN cluster. To 
do this, first create a new directory in all nodes:</p>
-<div class="highlight"><pre><code class="language-" data-lang="">mkdir ~/.samza
+
+<p><code class="highlighter-rouge">
+mkdir ~/.samza
 mkdir ~/.samza/conf
-</code></pre></div>
-<p>Copy (or soft link) <code>core-site.xml</code>, <code>hdfs-site.xml</code>, 
<code>yarn-site.xml</code> in <code>~/hadoop-2.2.0/etc/hadoop</code> to the new 
directory </p>
-<div class="highlight"><pre><code class="language-" data-lang="">ln -s 
~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml
+</code></p>
+
+<p>Copy (or soft link) <code class="highlighter-rouge">core-site.xml</code>, 
<code class="highlighter-rouge">hdfs-site.xml</code>, <code 
class="highlighter-rouge">yarn-site.xml</code> in <code 
class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop</code> to the new 
directory</p>
+
+<p><code class="highlighter-rouge">
+ln -s ~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml
 ln -s ~/.samza/conf/hdfs-site.xml ~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
 ln -s ~/.samza/conf/yarn-site.xml ~/hadoop-2.2.0/etc/hadoop/yarn-site.xml
-</code></pre></div>
+</code></p>
+
 <p>Export the enviroment variable YARN_HOME (in ~/.bashrc) so Samza knows 
where to find these YARN configuration files.</p>
-<div class="highlight"><pre><code class="language-" data-lang="">export 
YARN_HOME=$HOME/.samza
-</code></pre></div>
+
+<p><code class="highlighter-rouge">
+export YARN_HOME=$HOME/.samza
+</code></p>
+
 <p><strong>Start the YARN cluster</strong>
 Start resource manager on master node</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager
-</code></pre></div>
+
+<p><code class="highlighter-rouge">
+~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager
+</code></p>
+
 <p>Start node manager on all worker nodes</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager
-</code></pre></div>
-<h2 id="build-samoa">Build SAMOA</h2>
 
+<p><code class="highlighter-rouge">
+~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager
+</code></p>
+
+<h2 id="build-samoa">Build SAMOA</h2>
 <p>Perform the following step on one of the node in the cluster. Here we 
assume git and maven are installed on this node.</p>
 
 <p>Since Samza is not yet released on Maven, we will have to clone Samza 
project, build and publish to Maven local repository:</p>
-<div class="highlight"><pre><code class="language-" data-lang="">git clone -b 
0.7.0 https://github.com/apache/incubator-samza.git
+
+<p><code class="highlighter-rouge">
+git clone -b 0.7.0 https://github.com/apache/incubator-samza.git
 cd incubator-samza
 ./gradlew clean build
 ./gradlew publishToMavenLocal
-</code></pre></div>
-<p>Here we cloned and installed Samza version 0.7.0, the current released 
version (July 2014). </p>
+</code></p>
+
+<p>Here we cloned and installed Samza version 0.7.0, the current released 
version (July 2014).</p>
 
 <p>Now we can clone the repository and install SAMOA.</p>
-<div class="highlight"><pre><code class="language-" data-lang="">git clone 
http://git.apache.org/incubator-samoa.git
+
+<p><code class="highlighter-rouge">
+git clone http://git.apache.org/incubator-samoa.git
 cd incubator-samoa
 mvn -Psamza package
-</code></pre></div>
-<p>The deployable jars for SAMOA will be in 
<code>target/SAMOA-&lt;variant&gt;-&lt;version&gt;-SNAPSHOT.jar</code>. For 
example, in our case for Samza 
<code>target/SAMOA-Samza-0.2.0-SNAPSHOT.jar</code>.</p>
+</code></p>
 
-<h2 id="configure-samoa-samza-execution">Configure SAMOA-Samza execution</h2>
+<p>The deployable jars for SAMOA will be in <code 
class="highlighter-rouge">target/SAMOA-&lt;variant&gt;-&lt;version&gt;-SNAPSHOT.jar</code>.
 For example, in our case for Samza <code 
class="highlighter-rouge">target/SAMOA-Samza-0.2.0-SNAPSHOT.jar</code>.</p>
 
-<p>This section explains the configuration parameters in 
<code>bin/samoa-samza.properties</code> that are required to run SAMOA on top 
of Samza.</p>
+<h2 id="configure-samoa-samza-execution">Configure SAMOA-Samza execution</h2>
+<p>This section explains the configuration parameters in <code 
class="highlighter-rouge">bin/samoa-samza.properties</code> that are required 
to run SAMOA on top of Samza.</p>
 
 <p><strong>Samza execution mode</strong></p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">samoa.samza.mode=[yarn|local]
-</code></pre></div>
-<p>This parameter specify which mode to execute the task: <code>local</code> 
for local execution and <code>yarn</code> for cluster execution.</p>
+
+<p><code class="highlighter-rouge">
+samoa.samza.mode=[yarn|local]
+</code>
+This parameter specify which mode to execute the task: <code 
class="highlighter-rouge">local</code> for local execution and <code 
class="highlighter-rouge">yarn</code> for cluster execution.</p>
 
 <p><strong>Zookeeper</strong></p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">zookeeper.connect=localhost
+
+<p><code class="highlighter-rouge">
+zookeeper.connect=localhost
 zookeeper.port=2181
-</code></pre></div>
-<p>The default setting above applies for local mode execution. For cluster 
mode, change <code>zookeeper.host</code> to the correct URL of your zookeeper 
host.</p>
+</code>
+The default setting above applies for local mode execution. For cluster mode, 
change <code class="highlighter-rouge">zookeeper.host</code> to the correct URL 
of your zookeeper host.</p>
 
 <p><strong>Kafka</strong></p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">kafka.broker.list=localhost:9092
-</code></pre></div>
-<p><code>kafka.broker.list</code> is a comma separated list of host:port of 
all the brokers in Kafka cluster.</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">kafka.replication.factor=1
-</code></pre></div>
-<p><code>kafka.replication.factor</code> specifies the number of replicas for 
each stream in Kafka. This number must be less than or equal to the number of 
brokers in Kafka cluster.</p>
 
-<p><strong>YARN</strong></p>
+<p><code class="highlighter-rouge">
+kafka.broker.list=localhost:9092
+</code>
+<code class="highlighter-rouge">kafka.broker.list</code> is a comma separated 
list of host:port of all the brokers in Kafka cluster.</p>
+
+<p><code class="highlighter-rouge">
+kafka.replication.factor=1
+</code>
+<code class="highlighter-rouge">kafka.replication.factor</code> specifies the 
number of replicas for each stream in Kafka. This number must be less than or 
equal to the number of brokers in Kafka cluster.</p>
 
-<blockquote>
-<p>The below settings do not apply for local mode execution, you can leave 
them as they are.</p>
-</blockquote>
+<p><strong>YARN</strong>
+&gt; The below settings do not apply for local mode execution, you can leave 
them as they are.</p>
+
+<p><code class="highlighter-rouge">yarn.am.memory</code> and <code 
class="highlighter-rouge">yarn.container.memory</code> specify the memory 
requirement for the Application Master container and the worker containers, 
respectively.</p>
 
-<p><code>yarn.am.memory</code> and <code>yarn.container.memory</code> specify 
the memory requirement for the Application Master container and the worker 
containers, respectively. </p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">yarn.am.memory=1024
+<p><code class="highlighter-rouge">
+yarn.am.memory=1024
 yarn.container.memory=1024
-</code></pre></div>
-<p><code>yarn.package.path</code> specifies the path (typically a HDFS path) 
of the package to be distributed to all YARN containers to execute the task.</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar
-</code></pre></div>
+</code></p>
+
+<p><code class="highlighter-rouge">yarn.package.path</code> specifies the path 
(typically a HDFS path) of the package to be distributed to all YARN containers 
to execute the task.</p>
+
+<p><code class="highlighter-rouge">
+yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar
+</code></p>
+
 <p><strong>Samza</strong>
-<code>max.pi.per.container</code> specifies the number of PI instances allowed 
in one YARN container. </p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">max.pi.per.container=1
-</code></pre></div>
-<p><code>kryo.register.file</code> specifies the registration file for Kryo 
serializer.</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">kryo.register.file=samza-kryo
-</code></pre></div>
-<p><code>checkpoint.commit.ms</code> specifies the frequency for PIs to commit 
their checkpoints (in ms). The default value is 1 minute.</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">checkpoint.commit.ms=60000
-</code></pre></div>
-<h2 id="deploy-samoa-samza-task">Deploy SAMOA-Samza task</h2>
+<code class="highlighter-rouge">max.pi.per.container</code> specifies the 
number of PI instances allowed in one YARN container.</p>
+
+<p><code class="highlighter-rouge">
+max.pi.per.container=1
+</code></p>
+
+<p><code class="highlighter-rouge">kryo.register.file</code> specifies the 
registration file for Kryo serializer.</p>
 
+<p><code class="highlighter-rouge">
+kryo.register.file=samza-kryo
+</code></p>
+
+<p><code class="highlighter-rouge">checkpoint.commit.ms</code> specifies the 
frequency for PIs to commit their checkpoints (in ms). The default value is 1 
minute.</p>
+
+<p><code class="highlighter-rouge">
+checkpoint.commit.ms=60000
+</code></p>
+
+<h2 id="deploy-samoa-samza-task">Deploy SAMOA-Samza task</h2>
 <p>Execute SAMOA task with the following command:</p>
-<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar "&lt;task&gt; &amp; 
&lt;options&gt;" 
-</code></pre></div>
-<h2 id="observe-execution-and-result">Observe execution and result</h2>
 
-<p>In local mode, all the log will be printed out to stdout. If you execute 
the task on YARN cluster, the output is written to stdout files in YARN&#39;s 
containers&#39; log folder 
($HADOOP_HOME/logs/userlogs/application_&lt;application-id&gt;/container_&lt;container-id&gt;).</p>
+<p><code class="highlighter-rouge">
+bin/samoa samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar "&lt;task&gt; &amp; 
&lt;options&gt;" 
+</code></p>
+
+<h2 id="observe-execution-and-result">Observe execution and result</h2>
+<p>In local mode, all the log will be printed out to stdout. If you execute 
the task on YARN cluster, the output is written to stdout files in YARNâs 
containersâ log folder 
($HADOOP_HOME/logs/userlogs/application_&lt;application-id&gt;/container_&lt;container-id&gt;).</p>
 
   </article>
 

Modified: 
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html?rev=1737551&r1=1737550&r2=1737551&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html 
(original)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html 
Sun Apr  3 08:17:59 2016
@@ -76,103 +76,104 @@
     <p>In this tutorial page we describe how to execute SAMOA on top of Apache 
Storm. Here is an outline of what we want to do:</p>
 
 <ol>
-<li>Ensure that you have necessary Storm cluster and configuration to execute 
SAMOA</li>
-<li>Ensure that you have all the SAMOA deployables for execution in the 
cluster</li>
-<li>Configure samoa-storm.properties</li>
-<li>Execute SAMOA classification task</li>
-<li>Observe the task execution</li>
+  <li>Ensure that you have necessary Storm cluster and configuration to 
execute SAMOA</li>
+  <li>Ensure that you have all the SAMOA deployables for execution in the 
cluster</li>
+  <li>Configure samoa-storm.properties</li>
+  <li>Execute SAMOA classification task</li>
+  <li>Observe the task execution</li>
 </ol>
 
 <h3 id="storm-configuration">Storm Configuration</h3>
-
 <p>Before we start the tutorial, please ensure that you already have Storm 
cluster (preferably Storm 0.8.2) running. You can follow this <a 
href="http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/";>tutorial</a>
 to set up a Storm cluster.</p>
 
-<p>You also need to install Storm at the machine where you initiate the 
deployment, and configure Storm (at least) with this configuration in 
<code>~/.storm/storm.yaml</code>:</p>
-<div class="highlight"><pre><code class="language-" data-lang="">########### 
These MUST be filled in for a storm configuration
-nimbus.host: "&lt;enter your nimbus host name here&gt;"
+<p>You also need to install Storm at the machine where you initiate the 
deployment, and configure Storm (at least) with this configuration in <code 
class="highlighter-rouge">~/.storm/storm.yaml</code>:</p>
+
+<p>```
+########### These MUST be filled in for a storm configuration
+nimbus.host: â<enter your="" nimbus="" host="" name="" here="">"</enter></p>
 
-## List of custom serializations
-kryo.register:
+<h2 id="list-of-custom-serializations">List of custom serializations</h2>
+<p>kryo.register:
     - org.apache.samoa.learners.classifiers.trees.AttributeContentEvent: 
org.apache.samoa.learners.classifiers.trees.AttributeContentEvent$AttributeCEFullPrecSerializer
     - org.apache.samoa.learners.classifiers.trees.ComputeContentEvent: 
org.apache.samoa.learners.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer
-</code></pre></div>
-<!--
+<code class="highlighter-rouge">
+&lt;!--
 Or, if you are using SAMOA with optimized VHT, you should use this following 
configuration file:
-```
+</code>
 ########### These MUST be filled in for a storm configuration
-nimbus.host: "<enter your nimbus host name here>"
+nimbus.host: â<enter your="" nimbus="" host="" name="" here="">"</enter></p>
 
-## List of custom serializations
-kryo.register:
+<h2 id="list-of-custom-serializations-1">List of custom serializations</h2>
+<p>kryo.register:
      - org.apache.samoa.learners.classifiers.trees.NaiveAttributeContentEvent: 
org.apache.samoa.classifiers.trees.NaiveAttributeContentEvent$NaiveAttributeCEFullPrecSerializer
      - org.apache.samoa.learners.classifiers.trees.ComputeContentEvent: 
org.apache.samoa.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer
 ```
--->
+â&gt;</p>
 
-<p>Alternatively, if you don&#39;t have Storm cluster running, you can execute 
SAMOA with Storm in local mode as explained in section <a 
href="#samoa-storm-properties">samoa-storm.properties Configuration</a>.</p>
+<p>Alternatively, if you donât have Storm cluster running, you can execute 
SAMOA with Storm in local mode as explained in section <a 
href="#samoa-storm-properties">samoa-storm.properties Configuration</a>.</p>
 
 <h3 id="samoa-deployables">SAMOA deployables</h3>
-
 <p>There are three deployables for executing SAMOA on top of Storm. They 
are:</p>
 
 <ol>
-<li><code>bin/samoa</code> is the main script to execute SAMOA. You do not 
need to change anything in this script.</li>
-<li><code>target/SAMOA-Storm-x.x.x-SNAPSHOT.jar</code> is the deployed jar 
file. <code>x.x.x</code> is the version number of SAMOA. </li>
-<li><code>bin/samoa-storm.properties</code> contains deployment 
configurations. You need to set the parameters in this properties file 
correctly. </li>
+  <li><code class="highlighter-rouge">bin/samoa</code> is the main script to 
execute SAMOA. You do not need to change anything in this script.</li>
+  <li><code 
class="highlighter-rouge">target/SAMOA-Storm-x.x.x-SNAPSHOT.jar</code> is the 
deployed jar file. <code class="highlighter-rouge">x.x.x</code> is the version 
number of SAMOA.</li>
+  <li><code class="highlighter-rouge">bin/samoa-storm.properties</code> 
contains deployment configurations. You need to set the parameters in this 
properties file correctly.</li>
 </ol>
 
-<h3 id="samoa-storm-properties-configuration"><a 
name="samoa-storm-properties"> samoa-storm.properties Configuration</a></h3>
-
+<h3 id="a-namesamoa-storm-properties-samoa-stormproperties-configurationa"><a 
name="samoa-storm-properties"> samoa-storm.properties Configuration</a></h3>
 <p>Currently, the properties file contains two configurations:</p>
 
 <ol>
-<li><code>samoa.storm.mode</code> determines whether the task is executed 
locally (using Storm&#39;s <code>LocalCluster</code>) or executed in a Storm 
cluster. Use <code>local</code> if you want to test SAMOA and you do not have a 
Storm cluster for deployment. Use <code>cluster</code> if you want to test 
SAMOA on your Storm cluster.</li>
-<li><code>samoa.storm.numworker</code> determines the number of worker to 
execute the SAMOA tasks in the Storm cluster. This field must be an integer, 
less than or equal to the number of available slots in you Storm cluster. If 
you are using local mode, this property corresponds to the number of thread 
used by Storm&#39;s LocalCluster to execute your SAMOA task.</li>
+  <li><code class="highlighter-rouge">samoa.storm.mode</code> determines 
whether the task is executed locally (using Stormâs <code 
class="highlighter-rouge">LocalCluster</code>) or executed in a Storm cluster. 
Use <code class="highlighter-rouge">local</code> if you want to test SAMOA and 
you do not have a Storm cluster for deployment. Use <code 
class="highlighter-rouge">cluster</code> if you want to test SAMOA on your 
Storm cluster.</li>
+  <li><code class="highlighter-rouge">samoa.storm.numworker</code> determines 
the number of worker to execute the SAMOA tasks in the Storm cluster. This 
field must be an integer, less than or equal to the number of available slots 
in you Storm cluster. If you are using local mode, this property corresponds to 
the number of thread used by Stormâs LocalCluster to execute your SAMOA 
task.</li>
 </ol>
 
 <p>Here is the example of a complete properties file:</p>
-<div class="highlight"><pre><code class="language-" data-lang=""># SAMOA Storm 
properties file
+
+<p>```
+# SAMOA Storm properties file
 # This file contains specific configurations for SAMOA deployment in the Storm 
platform
 # Note that you still need to configure Storm client in your machine, 
-# including setting up Storm configuration file (~/.storm/storm.yaml) with 
correct settings
+# including setting up Storm configuration file (~/.storm/storm.yaml) with 
correct settings</p>
 
-# samoa.storm.mode corresponds to the execution mode of the Task in Storm 
-# possible values:
+<h1 
id="samoastormmode-corresponds-to-the-execution-mode-of-the-task-in-storm">samoa.storm.mode
 corresponds to the execution mode of the Task in Storm</h1>
+<p># possible values:
 #   1. cluster: the Task will be sent into nimbus. The nimbus is configured by 
Storm configuration file
 #   2. local: the Task will be sent using local Storm cluster
-samoa.storm.mode=cluster
+samoa.storm.mode=cluster</p>
 
-# samoa.storm.numworker corresponds to the number of worker processes 
allocated in Storm cluster
-# possible values: any integer greater than 0  
+<h1 
id="samoastormnumworker-corresponds-to-the-number-of-worker-processes-allocated-in-storm-cluster">samoa.storm.numworker
 corresponds to the number of worker processes allocated in Storm cluster</h1>
+<p># possible values: any integer greater than 0<br />
 samoa.storm.numworker=7
-</code></pre></div>
+```</p>
+
 <h3 id="samoa-task-execution">SAMOA task execution</h3>
 
-<p>You can execute a SAMOA task using the aforementioned 
<code>bin/samoa</code> script with this following format:
-<code>bin/samoa &lt;platform&gt; &lt;jar&gt; 
&quot;&lt;task&gt;&quot;</code>.</p>
+<p>You can execute a SAMOA task using the aforementioned <code 
class="highlighter-rouge">bin/samoa</code> script with this following format:
+<code class="highlighter-rouge">bin/samoa &lt;platform&gt; &lt;jar&gt; 
"&lt;task&gt;"</code>.</p>
 
-<p><code>&lt;platform&gt;</code> can be <code>storm</code> or <code>s4</code>. 
Using <code>storm</code> option means you are deploying SAMOA on a Storm 
environment. In this configuration, the script uses the aforementioned yaml 
file (<code>~/.storm/storm.yaml</code>) and <code>samoa-storm.properties</code> 
to perform the deployment. Using <code>s4</code> option means you are deploying 
SAMOA on an Apache S4 environment. Follow this <a 
href="Executing-SAMOA-with-Apache-S4">link</a> to learn more about deploying 
SAMOA on Apache S4.</p>
+<p><code class="highlighter-rouge">&lt;platform&gt;</code> can be <code 
class="highlighter-rouge">storm</code> or <code 
class="highlighter-rouge">s4</code>. Using <code 
class="highlighter-rouge">storm</code> option means you are deploying SAMOA on 
a Storm environment. In this configuration, the script uses the aforementioned 
yaml file (<code class="highlighter-rouge">~/.storm/storm.yaml</code>) and 
<code class="highlighter-rouge">samoa-storm.properties</code> to perform the 
deployment. Using <code class="highlighter-rouge">s4</code> option means you 
are deploying SAMOA on an Apache S4 environment. Follow this <a 
href="Executing-SAMOA-with-Apache-S4">link</a> to learn more about deploying 
SAMOA on Apache S4.</p>
 
-<p><code>&lt;jar&gt;</code> is the location of the deployed jar file 
(<code>SAMOA-Storm-x.x.x-SNAPSHOT.jar</code>) in your file system. The location 
can be a relative path or an absolute path into the jar file. </p>
+<p><code class="highlighter-rouge">&lt;jar&gt;</code> is the location of the 
deployed jar file (<code 
class="highlighter-rouge">SAMOA-Storm-x.x.x-SNAPSHOT.jar</code>) in your file 
system. The location can be a relative path or an absolute path into the jar 
file.</p>
 
-<p><code>&quot;&lt;task&gt;&quot;</code> is the SAMOA task command line such 
as <code>PrequentialEvaluation</code> or <code>ClusteringTask</code>. This 
command line for SAMOA task follows the format of <a 
href="http://moa.cms.waikato.ac.nz/details/classification/command-line/";>Massive
 Online Analysis (MOA)</a>.</p>
+<p><code class="highlighter-rouge">"&lt;task&gt;"</code> is the SAMOA task 
command line such as <code 
class="highlighter-rouge">PrequentialEvaluation</code> or <code 
class="highlighter-rouge">ClusteringTask</code>. This command line for SAMOA 
task follows the format of <a 
href="http://moa.cms.waikato.ac.nz/details/classification/command-line/";>Massive
 Online Analysis (MOA)</a>.</p>
 
 <p>The complete command to execute SAMOA is:</p>
-<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d 
/tmp/dump.csv -i 1000000 -f 100000 -l 
(org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s 
(org.apache.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u 10)"
-</code></pre></div>
-<p>The example above uses <a href="Prequential-Evaluation-Task">Prequential 
Evaluation task</a> and <a href="Vertical-Hoeffding-Tree-Classifier">Vertical 
Hoeffding Tree</a> classifier. </p>
 
-<h3 id="observing-task-execution">Observing task execution</h3>
+<p><code class="highlighter-rouge">
+bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation 
-d /tmp/dump.csv -i 1000000 -f 100000 -l 
(org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s 
(org.apache.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u 10)"
+</code>
+The example above uses <a href="Prequential-Evaluation-Task">Prequential 
Evaluation task</a> and <a href="Vertical-Hoeffding-Tree-Classifier">Vertical 
Hoeffding Tree</a> classifier.</p>
 
-<p>There are two ways to observe the task execution using Storm UI and by 
monitoring the dump file of the SAMOA task. Notice that the dump file will be 
created on the cluster if you are executing your task in <code>cluster</code> 
mode.</p>
+<h3 id="observing-task-execution">Observing task execution</h3>
+<p>There are two ways to observe the task execution using Storm UI and by 
monitoring the dump file of the SAMOA task. Notice that the dump file will be 
created on the cluster if you are executing your task in <code 
class="highlighter-rouge">cluster</code> mode.</p>
 
 <h4 id="using-storm-ui">Using Storm UI</h4>
-
 <p>Go to the web address of Storm UI and check whether the SAMOA task executes 
as intended. Use this UI to kill the associated Storm topology if necessary.</p>
 
 <h4 id="monitoring-the-dump-file">Monitoring the dump file</h4>
-
-<p>Several tasks have options to specify a dump file, which is a file that 
represents the task output. In our example, <a 
href="Prequential-Evaluation-Task">Prequential Evaluation task</a> has 
<code>-d</code> option which specifies the path to the dump file. Since Storm 
performs the allocation of Storm tasks, you should set the dump file into a 
file on a shared filesystem if you want to access it from the machine 
submitting the task.</p>
+<p>Several tasks have options to specify a dump file, which is a file that 
represents the task output. In our example, <a 
href="Prequential-Evaluation-Task">Prequential Evaluation task</a> has <code 
class="highlighter-rouge">-d</code> option which specifies the path to the dump 
file. Since Storm performs the allocation of Storm tasks, you should set the 
dump file into a file on a shared filesystem if you want to access it from the 
machine submitting the task.</p>
 
   </article>
 

Modified: incubator/samoa/site/documentation/Getting-Started.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Getting-Started.html?rev=1737551&r1=1737550&r2=1737551&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Getting-Started.html (original)
+++ incubator/samoa/site/documentation/Getting-Started.html Sun Apr  3 08:17:59 
2016
@@ -76,26 +76,40 @@
     <p>We start showing how simple is to run a first large scale machine 
learning task in SAMOA. We will evaluate a bagging ensemble method using 
decision trees on the Forest Covertype dataset.</p>
 
 <ul>
-<li>1. Download SAMOA </li>
+  <li>
+    <ol>
+      <li>Download SAMOA</li>
+    </ol>
+  </li>
 </ul>
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">git 
clone http://git.apache.org/incubator-samoa.git
-<span class="nb">cd </span>incubator-samoa
-mvn package      <span class="c">#Local mode</span>
-</code></pre></div>
-<ul>
-<li>2. Download the Forest CoverType dataset </li>
-</ul>
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">wget 
<span 
class="s2">"http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip";</span>
+
+<p><code class="highlighter-rouge">bash
+git clone http://git.apache.org/incubator-samoa.git
+cd incubator-samoa
+mvn package      #Local mode
+</code>
+* 2. Download the Forest CoverType dataset</p>
+
+<p><code class="highlighter-rouge">bash
+wget 
"http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip";
 unzip covtypeNorm.arff.zip 
-</code></pre></div>
+</code></p>
+
 <p><em>Forest Covertype</em> contains the forest cover type for 30 x 30 meter 
cells obtained from the US Forest Service (USFS) Region 2 Resource Information 
System (RIS) data. It contains 581,012 instances and 54 attributes, and it has 
been used in several articles on data stream classification.</p>
 
 <ul>
-<li>3.  Run an example: classifying the CoverType dataset with the bagging 
algorithm</li>
+  <li>
+    <ol>
+      <li>Run an example: classifying the CoverType dataset with the bagging 
algorithm</li>
+    </ol>
+  </li>
 </ul>
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">bin/samoa <span class="nb">local 
</span>target/SAMOA-Local-0.3.0-SNAPSHOT.jar <span 
class="s2">"PrequentialEvaluation -l classifiers.ensemble.Bagging 
-    -s (ArffFileStream -f covtypeNorm.arff) -f 100000"</span>
-</code></pre></div>
+
+<p><code class="highlighter-rouge">bash
+bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar "PrequentialEvaluation 
-l classifiers.ensemble.Bagging 
+    -s (ArffFileStream -f covtypeNorm.arff) -f 100000"
+</code></p>
+
 <p>The output will be a list of the evaluation results, plotted each 100,000 
instances.</p>
 
   </article>

Modified: incubator/samoa/site/documentation/Home.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Home.html?rev=1737551&r1=1737550&r2=1737551&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Home.html (original)
+++ incubator/samoa/site/documentation/Home.html Sun Apr  3 08:17:59 2016
@@ -81,58 +81,62 @@ SAMOA is similar to Mahout in spirit, bu
 <p>Apache SAMOA is simple and fun to use! This documentation is intended to 
give an introduction on how to use SAMOA in different ways. As a user you can 
run SAMOA algorithms on several stream processing engines: local mode, Storm, 
S4, Samza, and Flink. As a developer you can create new algorithms only once 
and test them in all of these distributed stream processing engines.</p>
 
 <h2 id="getting-started">Getting Started</h2>
-
 <ul>
-<li><a href="Getting-Started.html">0 Hands-on with SAMOA: Getting 
Started!</a></li>
+  <li><a href="Getting-Started.html">0 Hands-on with SAMOA: Getting 
Started!</a></li>
 </ul>
 
 <h2 id="users">Users</h2>
-
-<ul>
-<li><a href="Scalable-Advanced-Massive-Online-Analysis.html">1 Building and 
Executing SAMOA</a>
-
-<ul>
-<li><a href="Building-SAMOA.html">1.0 Building SAMOA</a></li>
-<li><a href="Executing-SAMOA-with-Apache-Storm.html">1.1 Executing SAMOA with 
Apache Storm</a></li>
-<li><a href="Executing-SAMOA-with-Apache-S4.html">1.2 Executing SAMOA with 
Apache S4</a></li>
-<li><a href="Executing-SAMOA-with-Apache-Samza.html">1.3 Executing SAMOA with 
Apache Samza</a></li>
-<li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">1.4 Executing SAMOA 
with Apache Avro Files</a></li>
-</ul></li>
-<li><a href="SAMOA-and-Machine-Learning.html">2 Machine Learning Methods in 
SAMOA</a>
-
 <ul>
-<li><a href="Prequential-Evaluation-Task.html">2.1 Prequential Evaluation 
Task</a></li>
-<li><a href="Vertical-Hoeffding-Tree-Classifier.html">2.2 Vertical Hoeffding 
Tree Classifier</a></li>
-<li><a href="Adaptive-Model-Rules-Regressor.html">2.3 Adaptive Model Rules 
Regressor</a></li>
-<li><a href="Bagging-and-Boosting.html">2.4 Bagging and Boosting</a></li>
-<li><a href="Distributed-Stream-Clustering.html">2.5 Distributed Stream 
Clustering</a></li>
-<li><a href="Distributed-Stream-Frequent-Itemset-Mining.html">2.6 Distributed 
Stream Frequent Itemset Mining</a></li>
-<li><a href="SAMOA-for-MOA-users.html">2.7 SAMOA for MOA users</a></li>
-</ul></li>
+  <li><a href="Scalable-Advanced-Massive-Online-Analysis.html">1 Building and 
Executing SAMOA</a>
+    <ul>
+      <li><a href="Building-SAMOA.html">1.0 Building SAMOA</a></li>
+      <li><a href="Executing-SAMOA-with-Apache-Storm.html">1.1 Executing SAMOA 
with Apache Storm</a></li>
+      <li><a href="Executing-SAMOA-with-Apache-S4.html">1.2 Executing SAMOA 
with Apache S4</a></li>
+      <li><a href="Executing-SAMOA-with-Apache-Samza.html">1.3 Executing SAMOA 
with Apache Samza</a></li>
+      <li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">1.4 Executing 
SAMOA with Apache Avro Files</a></li>
+    </ul>
+  </li>
+  <li><a href="SAMOA-and-Machine-Learning.html">2 Machine Learning Methods in 
SAMOA</a>
+    <ul>
+      <li><a href="Prequential-Evaluation-Task.html">2.1 Prequential 
Evaluation Task</a></li>
+      <li><a href="Vertical-Hoeffding-Tree-Classifier.html">2.2 Vertical 
Hoeffding Tree Classifier</a></li>
+      <li><a href="Adaptive-Model-Rules-Regressor.html">2.3 Adaptive Model 
Rules Regressor</a></li>
+      <li><a href="Bagging-and-Boosting.html">2.4 Bagging and Boosting</a></li>
+      <li><a href="Distributed-Stream-Clustering.html">2.5 Distributed Stream 
Clustering</a></li>
+      <li><a href="Distributed-Stream-Frequent-Itemset-Mining.html">2.6 
Distributed Stream Frequent Itemset Mining</a></li>
+      <li><a href="SAMOA-for-MOA-users.html">2.7 SAMOA for MOA users</a></li>
+    </ul>
+  </li>
 </ul>
 
 <h2 id="developers">Developers</h2>
-
 <ul>
-<li><a href="SAMOA-Topology.html">3 Understanding SAMOA Topologies</a>
-
-<ul>
-<li><a href="Processor.html">3.1 Processor</a></li>
-<li><a href="Content-Event.html">3.2 Content Event</a></li>
-<li><a href="Stream.html">3.3 Stream</a></li>
-<li><a href="Task.html">3.4 Task</a></li>
-<li><a href="Topology-Builder.html">3.5 Topology Builder</a></li>
-<li><a href="Learner.html">3.6 Learner</a></li>
-<li><a href="Processing-Item.html">3.7 Processing Item</a></li>
-</ul></li>
-<li><a href="Developing-New-Tasks-in-SAMOA.html">4 Developing New Tasks in 
SAMOA</a></li>
+  <li><a href="SAMOA-Topology.html">3 Understanding SAMOA Topologies</a>
+    <ul>
+      <li><a href="Processor.html">3.1 Processor</a></li>
+      <li><a href="Content-Event.html">3.2 Content Event</a></li>
+      <li><a href="Stream.html">3.3 Stream</a></li>
+      <li><a href="Task.html">3.4 Task</a></li>
+      <li><a href="Topology-Builder.html">3.5 Topology Builder</a></li>
+      <li><a href="Learner.html">3.6 Learner</a></li>
+      <li><a href="Processing-Item.html">3.7 Processing Item</a></li>
+    </ul>
+  </li>
+  <li><a href="Developing-New-Tasks-in-SAMOA.html">4 Developing New Tasks in 
SAMOA</a></li>
 </ul>
 
 <h3 id="getting-help">Getting help</h3>
+<p>Discussion about SAMOA happens on the Apache development mailing list <a 
href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#064;&#115;&#097;&#109;&#111;&#097;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#111;&#114;&#103;">&#100;&#101;&#118;&#064;&#115;&#097;&#109;&#111;&#097;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#111;&#114;&#103;</a></p>
 
-<p>Discussion about SAMOA happens on the Apache development mailing list <a 
href="mailto:[email protected]";>[email protected]</a></p>
-
-<p>[ <a href="mailto:[email protected]";>subscribe</a> | <a 
href="mailto:[email protected]";>unsubscribe</a> | <a 
href="http://mail-archives.apache.org/mod_mbox/incubator-samoa-dev";>archives</a>
 ]</p>
+<table>
+  <tbody>
+    <tr>
+      <td>[ <a 
href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#045;&#115;&#117;&#098;&#115;&#099;&#114;&#105;&#098;&#101;&#064;&#115;&#097;&#109;&#111;&#097;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#111;&#114;&#103;">subscribe</a></td>
+      <td><a 
href="&#109;&#097;&#105;&#108;&#116;&#111;:&#100;&#101;&#118;&#045;&#117;&#110;&#115;&#117;&#098;&#115;&#099;&#114;&#105;&#098;&#101;&#064;&#115;&#097;&#109;&#111;&#097;&#046;&#105;&#110;&#099;&#117;&#098;&#097;&#116;&#111;&#114;&#046;&#111;&#114;&#103;">unsubscribe</a></td>
+      <td><a 
href="http://mail-archives.apache.org/mod_mbox/incubator-samoa-dev";>archives</a>
 ]</td>
+    </tr>
+  </tbody>
+</table>
 
   </article>
 

Modified: incubator/samoa/site/documentation/Learner.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Learner.html?rev=1737551&r1=1737550&r2=1737551&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Learner.html (original)
+++ incubator/samoa/site/documentation/Learner.html Sun Apr  3 08:17:59 2016
@@ -74,18 +74,19 @@
 
   <article class="post-content">
     <p>Learners are implemented in SAMOA as sub-topologies.</p>
-<div class="highlight"><pre><code class="language-" data-lang="">public 
interface Learner extends Serializable{
 
-    public void init(TopologyBuilder topologyBuilder, Instances dataset);
+<p>```
+public interface Learner extends Serializable{</p>
 
-    public Processor getInputProcessor();
+<div class="highlighter-rouge"><pre class="highlight"><code>public void 
init(TopologyBuilder topologyBuilder, Instances dataset);
 
-    public Stream getResultStream();
-}
-</code></pre></div>
-<p>When a <code>Task</code> object is initiated via <code>init()</code>, the 
method <code>init(...)</code> of <code>Learner</code> is called, and the 
topology is added to the global topology of the task.</p>
+public Processor getInputProcessor();
 
-<p>To create a new learner, it is only needed to add streams, processors and 
their connections to the topology in <code>init(...)</code>, specify what is 
the processor that will manage the input stream of the learner in 
<code>getInputProcessor()</code>, and finally, specify what is going to be the 
output stream of the learner with <code>getResultStream()</code>.</p>
+public Stream getResultStream(); } ``` When a `Task` object is initiated via 
`init()`, the method `init(...)` of `Learner` is called, and the topology is 
added to the global topology of the task.
+</code></pre>
+</div>
+
+<p>To create a new learner, it is only needed to add streams, processors and 
their connections to the topology in <code 
class="highlighter-rouge">init(...)</code>, specify what is the processor that 
will manage the input stream of the learner in <code 
class="highlighter-rouge">getInputProcessor()</code>, and finally, specify what 
is going to be the output stream of the learner with <code 
class="highlighter-rouge">getResultStream()</code>.</p>
 
   </article>
 

Modified: incubator/samoa/site/documentation/Prequential-Evaluation-Task.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Prequential-Evaluation-Task.html?rev=1737551&r1=1737550&r2=1737551&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Prequential-Evaluation-Task.html 
(original)
+++ incubator/samoa/site/documentation/Prequential-Evaluation-Task.html Sun Apr 
 3 08:17:59 2016
@@ -73,26 +73,29 @@
   </header>
 
   <article class="post-content">
-    <p>In data stream mining, the most used evaluation scheme is the 
prequential or interleaved-test-then-train evolution. The idea is very simple: 
we use each instance first to test the model, and then to train the model. The 
Prequential Evaluation task evaluates the performance of online classifiers 
doing this. It supports two classification performance evaluators: the basic 
one which measures the accuracy of the classifier model since the start of the 
evaluation, and a window based one which measures the accuracy on the current 
sliding window of recent instances. </p>
+    <p>In data stream mining, the most used evaluation scheme is the 
prequential or interleaved-test-then-train evolution. The idea is very simple: 
we use each instance first to test the model, and then to train the model. The 
Prequential Evaluation task evaluates the performance of online classifiers 
doing this. It supports two classification performance evaluators: the basic 
one which measures the accuracy of the classifier model since the start of the 
evaluation, and a window based one which measures the accuracy on the current 
sliding window of recent instances.</p>
 
 <p>Examples of Prequential Evaluation task in SAMOA command line when 
deploying into Storm</p>
-<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d 
/tmp/dump.csv -i 1000000 -f 100000 -l (classifiers.trees.VerticalHoeffdingTree 
-p 4) -s (generators.RandomTreeGenerator -c 2 -o 10 -u 10)"
-</code></pre></div>
+
+<p><code class="highlighter-rouge">
+bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation 
-d /tmp/dump.csv -i 1000000 -f 100000 -l 
(classifiers.trees.VerticalHoeffdingTree -p 4) -s 
(generators.RandomTreeGenerator -c 2 -o 10 -u 10)"
+</code></p>
+
 <p>Parameters:</p>
 
 <ul>
-<li><code>-l</code>: classifier to train</li>
-<li><code>-s</code>: stream to learn from</li>
-<li><code>-e</code>: classification performance evaluation method</li>
-<li><code>-i</code>: maximum number of instances to test/train on (-1 = no 
limit)</li>
-<li><code>-f</code>: number of instances between samples of the learning 
performance</li>
-<li><code>-n</code>: evaluation name (default: 
PrequentialEvaluation_TimeStamp)</li>
-<li><code>-d</code>: file to append intermediate csv results to</li>
+  <li><code class="highlighter-rouge">-l</code>: classifier to train</li>
+  <li><code class="highlighter-rouge">-s</code>: stream to learn from</li>
+  <li><code class="highlighter-rouge">-e</code>: classification performance 
evaluation method</li>
+  <li><code class="highlighter-rouge">-i</code>: maximum number of instances 
to test/train on (-1 = no limit)</li>
+  <li><code class="highlighter-rouge">-f</code>: number of instances between 
samples of the learning performance</li>
+  <li><code class="highlighter-rouge">-n</code>: evaluation name (default: 
PrequentialEvaluation_TimeStamp)</li>
+  <li><code class="highlighter-rouge">-d</code>: file to append intermediate 
csv results to</li>
 </ul>
 
-<p>In terms of SAMOA API, the Prequential Evaluation Task consists of a source 
<code>Entrance Processor</code>, a <code>Classifier</code>, and an 
<code>Evaluator Processor</code> as shown below. The <code>Entrance 
Processor</code> sends instances to the <code>Classifier</code> using the 
<code>source</code> stream. The classifier sends the classification results to 
the <code>Evaluator Processor</code> via the <code>result</code> stream. The 
<code>Entrance Processor</code> corresponds to the <code>-s</code> option of 
Prequential Evaluation, the <code>Classifier</code> corresponds to the 
<code>-l</code> option, and the <code>Evaluator Processor</code> corresponds to 
the <code>-e</code> option.</p>
+<p>In terms of SAMOA API, the Prequential Evaluation Task consists of a source 
<code class="highlighter-rouge">Entrance Processor</code>, a <code 
class="highlighter-rouge">Classifier</code>, and an <code 
class="highlighter-rouge">Evaluator Processor</code> as shown below. The <code 
class="highlighter-rouge">Entrance Processor</code> sends instances to the 
<code class="highlighter-rouge">Classifier</code> using the <code 
class="highlighter-rouge">source</code> stream. The classifier sends the 
classification results to the <code class="highlighter-rouge">Evaluator 
Processor</code> via the <code class="highlighter-rouge">result</code> stream. 
The <code class="highlighter-rouge">Entrance Processor</code> corresponds to 
the <code class="highlighter-rouge">-s</code> option of Prequential Evaluation, 
the <code class="highlighter-rouge">Classifier</code> corresponds to the <code 
class="highlighter-rouge">-l</code> option, and the <code 
class="highlighter-rouge">Evaluator Processor</code> co
 rresponds to the <code class="highlighter-rouge">-e</code> option.</p>
 
-<p><img src="images/PrequentialEvaluation.png" alt="Prequential Evaluation 
Task"></p>
+<p><img src="images/PrequentialEvaluation.png" alt="Prequential Evaluation 
Task" /></p>
 
   </article>
 

Modified: incubator/samoa/site/documentation/Processing-Item.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Processing-Item.html?rev=1737551&r1=1737550&r2=1737551&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Processing-Item.html (original)
+++ incubator/samoa/site/documentation/Processing-Item.html Sun Apr  3 08:17:59 
2016
@@ -82,30 +82,33 @@ It is used internally, and it is not acc
 There are two types of Processing Items.</p>
 
 <ol>
-<li>Simple Processing Item (PI)</li>
-<li>Entrance Processing Item (EntrancePI)</li>
+  <li>Simple Processing Item (PI)</li>
+  <li>Entrance Processing Item (EntrancePI)</li>
 </ol>
 
-<h4 id="1-simple-processing-item-pi">1. Simple Processing Item (PI)</h4>
+<h4 id="simple-processing-item-pi">1. Simple Processing Item (PI)</h4>
+<p>Once a Processor is wrapped in a PI, it becomes an executable component of 
the topology. All physical topology units are created with the help of a <code 
class="highlighter-rouge">TopologyBuilder</code>. Following code snippet shows 
the creation of a Processing Item.</p>
 
-<p>Once a Processor is wrapped in a PI, it becomes an executable component of 
the topology. All physical topology units are created with the help of a 
<code>TopologyBuilder</code>. Following code snippet shows the creation of a 
Processing Item.</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">builder.initTopology("MyTopology");
+<p><code class="highlighter-rouge">
+builder.initTopology("MyTopology");
 Processor samplerProcessor = new Sampler();
 ProcessingItem samplerPI = builder.createPI(samplerProcessor,3);
-</code></pre></div>
-<p>The <code>createPI()</code> method of <code>TopologyBuilder</code> is used 
to create a PI. Its first argument is the instance of a Processor which needs 
to be wrapped-in. Its second argument is the parallelism hint. It tells the 
underlying platforms how many parallel instances of this PI should be created 
on different nodes.</p>
-
-<h4 id="2-entrance-processing-item-entrancepi">2. Entrance Processing Item 
(EntrancePI)</h4>
+</code>
+The <code class="highlighter-rouge">createPI()</code> method of <code 
class="highlighter-rouge">TopologyBuilder</code> is used to create a PI. Its 
first argument is the instance of a Processor which needs to be wrapped-in. Its 
second argument is the parallelism hint. It tells the underlying platforms how 
many parallel instances of this PI should be created on different nodes.</p>
 
+<h4 id="entrance-processing-item-entrancepi">2. Entrance Processing Item 
(EntrancePI)</h4>
 <p>Entrance Processing Item is different from a PI in only one way: it accepts 
an Entrance Processor which can generate its own stream.
 It is mostly used as the source of a topology.
 It connects to external sources, pulls data and provides it to the topology in 
the form of streams.
-All physical topology units are created with the help of a 
<code>TopologyBuilder</code>.
+All physical topology units are created with the help of a <code 
class="highlighter-rouge">TopologyBuilder</code>.
 The following code snippet shows the creation of an Entrance Processing 
Item.</p>
-<div class="highlight"><pre><code class="language-" 
data-lang="">builder.initTopology("MyTopology");
+
+<p><code class="highlighter-rouge">
+builder.initTopology("MyTopology");
 EntranceProcessor sourceProcessor = new Source();
 EntranceProcessingItem sourcePi = builder.createEntrancePi(sourceProcessor);
-</code></pre></div>
+</code></p>
+
   </article>
 
 <!-- </div> -->

svn commit: r1737551 [2/3] - /incubator/samoa/site/documentation/

Reply via email to