Added: incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html?rev=1661475&view=auto
==============================================================================
--- incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html 
(added)
+++ incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html Sun 
Feb 22 13:41:20 2015
@@ -0,0 +1,245 @@
+<!DOCTYPE html>
+<html>
+
+    <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="description" content="">
+    <meta name="author" content="">
+    <link rel="icon" href="/assets/favicon.ico">
+
+    <title>Developing New Tasks in Apache SAMOA</title>
+
+    <!-- Bootstrap core CSS -->
+    <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
+    <!-- Bootstrap theme -->
+    <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
+
+    <!-- Custom styles for this template -->
+    <link href="/assets/css/theme.css" rel="stylesheet">
+       
+       <link href="/css/main.css" rel="stylesheet">
+
+    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
+    <!--[if lt IE 9]><script 
src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
+    <script src="/assets/js/ie-emulation-modes-warning.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!--[if lt IE 9]>
+      <script 
src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js";></script>
+      <script 
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+  </head>
+
+
+
+  <body>
+    <div class="container">
+        <!-- Fixed navbar -->
+    <nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+      <div class="container">
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#navbar" aria-expanded="false" 
aria-controls="navbar">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a class="navbar-brand" href="/index.html">Apache SAMOA</a>
+        </div>
+        <div id="navbar" class="navbar-collapse collapse">
+          <ul class="nav navbar-nav">
+            <li><a href="/index.html">Home</a></li>    
+                       <li><a 
href="/documentation/Home.html">Documentation</a></li>
+                       <li><a 
href="/documentation/Team.html">Contributors</a></li>
+                       <li><a href="/documentation/Bylaws.html">Bylaws</a></li>
+          </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+
+
+
+
+      
+        <!-- Documentation -->
+<!-- <div class="container"> -->
+
+  <header class="post-header">
+    <h1 class="post-title">Developing New Tasks in Apache SAMOA</h1>
+    <p class="post-meta"></p>
+  </header>
+
+  <article class="post-content">
+    <p>A <em>task</em> is a machine learning related activity such as a 
specific evaluation for a classifier. For instance the <em>prequential 
evaluation</em> task is a task that uses each instance first for testing and 
then for training a model built using a specific classification algorithm. A 
task corresponds to a topology in SAMOA. </p>
+
+<p>In this tutorial, we will develop a simple Hello World task.</p>
+
+<h3 id="hello-world-task">Hello World Task</h3>
+
+<p>The Hello World task consists of a source processor, a destination 
processor with a parallelism hint setting, and a stream that connects the two. 
The source processor will generate a random integer which will be sent to the 
destination processor. The figure below shows the layout of Hello World 
task.</p>
+
+<p><img src="images/HelloWorldTask.png" alt="Hello World Task"></p>
+
+<p>To develop the task, we create a new class that implements the interface 
<code>com.yahoo.labs.samoa.tasks.Task</code>. For convenience we also implement 
<code>com.github.javacliparser.Configurable</code> which allows to parse 
command-line options.</p>
+
+<p>The <code>init</code> method builds the topology by instantiating the 
necessary <code>Processors</code>, <code>Streams</code> and connecting the 
source processor with the destination processor.</p>
+
+<h3 id="hello-world-source-processor">Hello World Source Processor</h3>
+
+<p>We need a source processor which is an instance of 
<code>EntranceProcessor</code> to start a task in SAMOA. In this tutorial, the 
source processor is <code>HelloWorldSourceProcessor</code>. </p>
+
+<p>The SAMOA runtime invokes the <code>nextEvent</code> method of 
<code>EntranceProcessor</code> until its <code>hasNext</code> method returns 
false. Each call to <code>nextEvent</code> should return the next 
<code>ContentEvent</code> to be sent to the topology. In this tutorial, 
<code>HelloWorldSourceProcessor</code> sends events of type 
<code>HelloWorldContentEvent</code>.</p>
+
+<p>Here is the relevant code in <code>HelloWorldSourceProcessor</code>:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">    
private Random rnd;
+    private final long maxInst;
+    private long count;
+
+    @Override
+    public boolean hasNext() {
+        return count &lt; maxInst;
+    }
+
+    @Override
+    public ContentEvent nextEvent() {
+        count++;
+        return new HelloWorldContentEvent(rnd.nextInt(), false);
+    }
+</code></pre></div>
+<p>We also need to create a new type of <code>ContentEvent</code> to hold our 
data. In this tutorial we call it <code>HelloWorldContentEvent</code> and its 
content is simply an integer.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">public class HelloWorldContentEvent implements ContentEvent {
+
+    private static final long serialVersionUID = -2406968925730298156L;
+    private final boolean isLastEvent;
+    private final int helloWorldData;
+
+    public HelloWorldContentEvent(int helloWorldData, boolean isLastEvent) {
+        this.isLastEvent = isLastEvent;
+        this.helloWorldData = helloWorldData;
+    }
+
+    @Override
+    public String getKey() {
+        return null;
+    }
+
+    @Override
+    public void setKey(String str) {
+        // do nothing, it&#39;s key-less content event
+    }
+
+    @Override
+    public boolean isLastEvent() {
+        return isLastEvent;
+    }
+
+    public int getHelloWorldData() {
+        return helloWorldData;
+    }
+
+    @Override
+    public String toString() {
+        return &quot;HelloWorldContentEvent [helloWorldData=&quot; + 
helloWorldData + &quot;]&quot;;
+    }
+}
+</code></pre></div>
+<h3 id="hello-world-destination-processor">Hello World Destination 
Processor</h3>
+
+<p>The destination processor for SAMOA is pretty straightforward and it will 
print the data from the event.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">public class HelloWorldDestinationProcessor implements 
Processor {
+
+    private static final long serialVersionUID = -6042613438148776446L;
+    private int processorId;
+
+    @Override
+    public boolean process(ContentEvent event) {
+        System.out.println(processorId + &quot;: &quot; + event);
+        return true;
+    }
+
+    @Override
+    public void onCreate(int id) {
+        this.processorId = id;
+    }
+
+    @Override
+    public Processor newProcessor(Processor p) {
+        return new HelloWorldDestinationProcessor();
+    }
+}
+</code></pre></div>
+<h3 id="putting-it-all-together">Putting It All Together</h3>
+
+<p>To put all the components together, we need to go back to class 
<code>HelloWorldTask</code>. First, we need to implement the code for setting 
up the <code>TopologyBuilder</code>. This code is necessary to be able to run 
on multiple platforms.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">    
@Override
+    public void setFactory(ComponentFactory factory) {
+        builder = new TopologyBuilder(factory);
+        logger.debug(&quot;Sucessfully instantiating TopologyBuilder&quot;);
+        builder.initTopology(evaluationNameOption.getValue());
+        logger.debug(&quot;Sucessfully initializing SAMOA topology with name 
{}&quot;, evaluationNameOption.getValue());
+    }
+</code></pre></div>
+<p>After this method is called we have a functioning builder to get components 
for our topology. Next, the <code>init</code> method is called by SAMOA to 
start the task.
+First we instantiate the source <code>EntranceProcessor</code>.
+After adding the entrance processor to the topology, we create a stream 
originating from it. We use the create stream method of 
<code>TopologyBuilder</code>.
+Next we create the destination processor and connect it to the stream by using 
shuffle grouping.
+Once we have created all the components, we use the builder to build the 
topology.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">    
@Override
+    public void init() {
+        // create source EntranceProcesor
+        sourceProcessor = new 
HelloWorldSourceProcessor(instanceLimitOption.getValue());
+        builder.addEntranceProcessor(sourceProcessor);
+
+        // create Stream
+        Stream stream = builder.createStream(sourceProcessor);
+
+        // create destination Processor
+        destProcessor = new HelloWorldDestinationProcessor();
+        builder.addProcessor(destProcessor, 
helloWorldParallelismOption.getValue());
+        builder.connectInputShuffleStream(stream, destProcessor);
+
+        // build the topology
+        helloWorldTopology = builder.build();
+        logger.debug(&quot;Successfully built the topology&quot;);
+    }
+</code></pre></div>
+<h3 id="running-it">Running It</h3>
+
+<p>To run the example in local mode:</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">bin/samoa local target/SAMOA-Local-0.0.1-SNAPSHOT.jar 
&quot;com.yahoo.labs.samoa.examples.HelloWorldTask -p 4 -i 100&quot;
+</code></pre></div>
+<p>To run the example in Storm local mode:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">java 
-cp 
$STORM_HOME/lib/*:$STORM_HOME/storm-0.8.2.jar:target/SAMOA-Storm-0.0.1-SNAPSHOT.jar
 com.yahoo.labs.samoa.LocalStormDoTask 
&quot;com.yahoo.labs.samoa.examples.HelloWorldTask -p 4 -i 1000&quot;
+</code></pre></div>
+<p>All the code for the HelloWorldTask and its components can be found <a 
href="https://github.com/yahoo/samoa/tree/master/samoa-api/src/main/java/com/yahoo/labs/samoa/examples";>here</a>.</p>
+
+  </article>
+
+<!-- </div> -->
+
+      
+
+    <hr/>
+<div id="footer" class="container text-center">
+       
+            <p class="text-muted credit"><p>
+Copyright © 2014 <a href="http://www.apache.org";>Apache Software 
Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache 
feather logo  are trademarks of The Apache Software Foundation. All other marks 
mentioned may be trademarks or registered trademarks of their respective 
owners.</p>
+
+</div>
+
+    <!-- Bootstrap core JavaScript
+    ================================================== -->
+    <!-- Placed at the end of the document so the pages load faster -->
+    <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js";></script>
+    <script src="/assets/js/bootstrap.min.js"></script>
+    <script src="/assets/js/docs.min.js"></script>
+    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
+    <script src="/assets/js/ie10-viewport-bug-workaround.js"></script>
+
+    </div>
+       
+  </body>
+
+</html>

Added: incubator/samoa/site/documentation/Distributed-Stream-Clustering.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Distributed-Stream-Clustering.html?rev=1661475&view=auto
==============================================================================
--- incubator/samoa/site/documentation/Distributed-Stream-Clustering.html 
(added)
+++ incubator/samoa/site/documentation/Distributed-Stream-Clustering.html Sun 
Feb 22 13:41:20 2015
@@ -0,0 +1,120 @@
+<!DOCTYPE html>
+<html>
+
+    <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="description" content="">
+    <meta name="author" content="">
+    <link rel="icon" href="/assets/favicon.ico">
+
+    <title>Distributed Stream Clustering</title>
+
+    <!-- Bootstrap core CSS -->
+    <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
+    <!-- Bootstrap theme -->
+    <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
+
+    <!-- Custom styles for this template -->
+    <link href="/assets/css/theme.css" rel="stylesheet">
+       
+       <link href="/css/main.css" rel="stylesheet">
+
+    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
+    <!--[if lt IE 9]><script 
src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
+    <script src="/assets/js/ie-emulation-modes-warning.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!--[if lt IE 9]>
+      <script 
src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js";></script>
+      <script 
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+  </head>
+
+
+
+  <body>
+    <div class="container">
+        <!-- Fixed navbar -->
+    <nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+      <div class="container">
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#navbar" aria-expanded="false" 
aria-controls="navbar">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a class="navbar-brand" href="/index.html">Apache SAMOA</a>
+        </div>
+        <div id="navbar" class="navbar-collapse collapse">
+          <ul class="nav navbar-nav">
+            <li><a href="/index.html">Home</a></li>    
+                       <li><a 
href="/documentation/Home.html">Documentation</a></li>
+                       <li><a 
href="/documentation/Team.html">Contributors</a></li>
+                       <li><a href="/documentation/Bylaws.html">Bylaws</a></li>
+          </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+
+
+
+
+      
+        <!-- Documentation -->
+<!-- <div class="container"> -->
+
+  <header class="post-header">
+    <h1 class="post-title">Distributed Stream Clustering</h1>
+    <p class="post-meta"></p>
+  </header>
+
+  <article class="post-content">
+    <h2 id="apache-samoa-clustering-algorithm">Apache SAMOA Clustering 
Algorithm</h2>
+
+<p>The SAMOA Clustering Algorithm is invoked by using the 
<code>ClusteringEvaluation</code> task. The clustering task can be executed 
with default values just by running:</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar 
&quot;ClusteringEvaluation&quot;
+</code></pre></div>
+<p>Parameters:</p>
+
+<ul>
+<li><code>-l</code>: clusterer to train</li>
+<li><code>-s</code>: stream to learn from</li>
+<li><code>-i</code>: maximum number of instances to test/train on (-1 = no 
limit)</li>
+<li><code>-f</code>: how many instances between samples of the learning 
performance</li>
+<li><code>-n</code>: evaluation name (default: 
ClusteringEvaluation_TimeStamp)</li>
+<li><code>-d</code>: file to append intermediate csv results to</li>
+</ul>
+
+<p>In terms of the SAMOA API, Clustering Evaluation consists of a 
<code>source</code> processor, a <code>clusterer</code>, and a 
<code>evaluator</code> processor. <code>Source</code> processor sends the 
instances to the classifier using <code>source</code> stream. The clusterer 
sends the clustering results to the <code>evaluator</code> processor via the 
<code>result</code> stream. The <code>source Processor</code> corresponds to 
the <code>-s</code> option of Clustering Evaluation, and the clusterer 
corresponds to the <code>-l</code> option.</p>
+
+  </article>
+
+<!-- </div> -->
+
+      
+
+    <hr/>
+<div id="footer" class="container text-center">
+       
+            <p class="text-muted credit"><p>
+Copyright © 2014 <a href="http://www.apache.org";>Apache Software 
Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache 
feather logo  are trademarks of The Apache Software Foundation. All other marks 
mentioned may be trademarks or registered trademarks of their respective 
owners.</p>
+
+</div>
+
+    <!-- Bootstrap core JavaScript
+    ================================================== -->
+    <!-- Placed at the end of the document so the pages load faster -->
+    <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js";></script>
+    <script src="/assets/js/bootstrap.min.js"></script>
+    <script src="/assets/js/docs.min.js"></script>
+    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
+    <script src="/assets/js/ie10-viewport-bug-workaround.js"></script>
+
+    </div>
+       
+  </body>
+
+</html>

Added: 
incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html?rev=1661475&view=auto
==============================================================================
--- 
incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html
 (added)
+++ 
incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html
 Sun Feb 22 13:41:20 2015
@@ -0,0 +1,167 @@
+<!DOCTYPE html>
+<html>
+
+    <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="description" content="">
+    <meta name="author" content="">
+    <link rel="icon" href="/assets/favicon.ico">
+
+    <title>Distributed Frequent Itemset Mining</title>
+
+    <!-- Bootstrap core CSS -->
+    <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
+    <!-- Bootstrap theme -->
+    <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
+
+    <!-- Custom styles for this template -->
+    <link href="/assets/css/theme.css" rel="stylesheet">
+       
+       <link href="/css/main.css" rel="stylesheet">
+
+    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
+    <!--[if lt IE 9]><script 
src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
+    <script src="/assets/js/ie-emulation-modes-warning.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!--[if lt IE 9]>
+      <script 
src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js";></script>
+      <script 
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+  </head>
+
+
+
+  <body>
+    <div class="container">
+        <!-- Fixed navbar -->
+    <nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+      <div class="container">
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#navbar" aria-expanded="false" 
aria-controls="navbar">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a class="navbar-brand" href="/index.html">Apache SAMOA</a>
+        </div>
+        <div id="navbar" class="navbar-collapse collapse">
+          <ul class="nav navbar-nav">
+            <li><a href="/index.html">Home</a></li>    
+                       <li><a 
href="/documentation/Home.html">Documentation</a></li>
+                       <li><a 
href="/documentation/Team.html">Contributors</a></li>
+                       <li><a href="/documentation/Bylaws.html">Bylaws</a></li>
+          </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+
+
+
+
+      
+        <!-- Documentation -->
+<!-- <div class="container"> -->
+
+  <header class="post-header">
+    <h1 class="post-title">Distributed Frequent Itemset Mining</h1>
+    <p class="post-meta"></p>
+  </header>
+
+  <article class="post-content">
+    <h2 id="1.-introduction">1. Introduction</h2>
+
+<p>SAMOA takes a micro-batching approach to frequent itemset mining (FIM). It 
uses <a href="https://dl.acm.org/citation.cfm?id=2396776";>PARMA</a> as a base 
algorithm for distributed sample-based frequent itemset mining. PARMA provides 
the guaranty that all the frequent itemsets would be present in the result that 
it returns.It also returns some false positives. The problem with FIM in 
streams is that the stream has an evolving nature. The itemsets that were 
frequent last year may not be frequent this year. To handle this, SAMOA 
implements <a href="https://dl.acm.org/citation.cfm?id=1164180";>Time Biased 
Sampling</a> approach. This sampling method depends on a parameter 
<em>lambda</em> which determines the size of the reservoir sample. This also 
tells us how much biased the sample would be towards newer itemsets. As PARMA 
has its own way of determining sample sizes, SAMOA does not allow users to 
choose <em>lambda</em> and determines its value using the sample size 
determined by PARMA 
 using the approximation <code>lambda = 1/sampleSize</code>. </p>
+
+<h2 id="2.-concepts">2. Concepts</h2>
+
+<p>SAMOA implements FIM for streams in three processors i.e. 
StreamSourceProcessor, SamplerProcessor and AggregatorProcessor. The tasks of 
each of these are explained below.</p>
+
+<ol>
+<li><p>StreamSourceP takes as input the input transaction file. 
StreamSourceProcessor (Entrance PI) starts sending the transactions randomly to 
SamplerProcessor instances. The number of SamplerProcessors to instantiate is 
taken as an argument from the user but is verified by PARMA. PARMA determines 
this number based on the <code>epsilon</code> and <code>phi</code> parameters 
provided by the user. StreamSourceProcessor sends an FPM=&#39;yes&#39; command 
to all the instances of SamplerProcessor after 2M transactions where 
M=numSamples*sampleSize. After first FPM=&#39;yes&#39; command, all later 
FPM=&#39;yes&#39; commands are sent after <code>fpmGap</code> transactions 
which is one of the parameter SAMOA FIM task takes as input.</p></li>
+<li><p>All the instances of SamplerProcessor start building a Time Biased 
Reservoir Sample in which newer transactions have more weight. Time biased 
sampling is the default approach but user can provide his own sampler by 
implementing <code>samoa.samplers.SamplerInterface</code>. When a 
SamplerProcessor receives FPM=&#39;yes&#39; command, it starts FIM/FPM on the 
reservoir irrespective of whether the reservoir is full or not. When it 
completes, it sends the result item-sets to the AggregatorProcessor with the 
epoch/batch id. At the end of the result, each SamplerProcessor sends the 
(“epoch_end”,<epochNum>) message to the AggregatorProcessor.</p></li>
+<li><p>AggregatorProcessor receives the result item-sets from all 
SamplerProcessors. It maintains different queues for different batch ids and 
also maintains a count of the number of SamplerProcessors which have finished 
sending their results for a corresponding batch/epoch. Whenever the 
<code>epoch_end</code> message count becomes equal to the number of instances 
of SampleProcessor, AggregatorProcessor aggregates the results and stores it in 
the file system using the output path specified by the user.</p></li>
+</ol>
+
+<p>In this way, epochs never overlap.If <code>fpmGap</code> is small and the 
StreamSourceProcessor dispatches an FPM=&#39;yes&#39; command before the 
slowest SamplerProcessor finishes FIM on the last epoch, the speed of the 
global FIM will be equal to the local FIM of the slowest SamplerProcessor. (or 
AggregatorProcessor if it is slower than the slowest SamplerProcessor)</p>
+
+<p><img src="images/SAMOA%20FIM.jpg" alt="SAMOA FIM"></p>
+
+<h2 id="3.-how-to-run">3. How to run</h2>
+
+<p>Following is an example of the command used to run the SAMOA FIM task.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar 
&quot;FpmTask -t Myfpmtopology -r 
(com.yahoo.labs.samoa.fpm.processors.FileReaderProcessor -i 
/datasets/freqDataCombined.txt) -m 
(com.yahoo.labs.samoa.fpm.processors.ParmaStreamFpmMiner -e .1 -d .1 -f 10 -t 
20 -n 23 -p 0.08   -b 100000 -s 
com.yahoo.labs.samoa.samplers.reservoir.TimeBiasedReservoirSampler) -w 
(com.yahoo.labs.samoa.fpm.processors.FileWriterProcessor -o /output/outPARMA) 
&quot;
+</code></pre></div>
+<p>Parameters:
+To run an FIM task, four parameters are required</p>
+
+<ul>
+<li><code>-t</code>: Topology name (Can be any name)</li>
+<li><code>-r</code>: The reader class</li>
+<li><code>-m</code>: The miner class</li>
+<li><code>-w</code>: The writer class</li>
+</ul>
+
+<p>In the example above, <code>FileReaderProcessor</code> is used as a reader 
class. It takes only one parameter:</p>
+
+<ul>
+<li><code>-i</code>: Path to input file</li>
+</ul>
+
+<p>Similarly, <code>FileWriterProcessor</code> is used as a writer class. It 
takes only one parameter:</p>
+
+<ul>
+<li><code>-o</code>: Path to output file</li>
+</ul>
+
+<p>SAMOA comes with a built-in distributed frequent mining algorithm PARMA as 
described above but users can plug-in their custom miners by implementing the 
<code>FpmMinerInterface</code>. The built-in PARMA miner can be used with the 
following parameters:</p>
+
+<ul>
+<li><code>-e</code>: epsilon parameter for <a 
href="https://dl.acm.org/citation.cfm?id=2396776";>PARMA</a></li>
+<li><code>-d</code>: delta parameter for <a 
href="https://dl.acm.org/citation.cfm?id=2396776";>PARMA</a></li>
+<li><code>-f</code>: minimum frequency (percentage) of a frequent itemset</li>
+<li><code>-t</code>: maximum length of a transaction</li>
+<li><code>-n</code>: number of samples to maintain</li>
+<li><code>-a</code>: number of aggregators to initiate</li>
+<li><code>-p</code>: phi parameter for <a 
href="https://dl.acm.org/citation.cfm?id=2396776";>PARMA</a></li>
+<li><code>-i</code>: path to input file</li>
+<li><code>-o</code>: path to output file</li>
+<li><code>-b</code>: batch size or fpmGap (Number of transactions after which 
FIM should be performed)</li>
+<li><code>-s</code>: Sampler Class to be used for sampling at each node</li>
+</ul>
+
+<h2 id="note">Note</h2>
+
+<p>This method is currently unavailable in the master branch of SAMOA due to 
licensing restriction.</p>
+
+  </article>
+
+<!-- </div> -->
+
+      
+
+    <hr/>
+<div id="footer" class="container text-center">
+       
+            <p class="text-muted credit"><p>
+Copyright © 2014 <a href="http://www.apache.org";>Apache Software 
Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache 
feather logo  are trademarks of The Apache Software Foundation. All other marks 
mentioned may be trademarks or registered trademarks of their respective 
owners.</p>
+
+</div>
+
+    <!-- Bootstrap core JavaScript
+    ================================================== -->
+    <!-- Placed at the end of the document so the pages load faster -->
+    <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js";></script>
+    <script src="/assets/js/bootstrap.min.js"></script>
+    <script src="/assets/js/docs.min.js"></script>
+    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
+    <script src="/assets/js/ie10-viewport-bug-workaround.js"></script>
+
+    </div>
+       
+  </body>
+
+</html>

Added: incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html?rev=1661475&view=auto
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html 
(added)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html Sun 
Feb 22 13:41:20 2015
@@ -0,0 +1,200 @@
+<!DOCTYPE html>
+<html>
+
+    <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="description" content="">
+    <meta name="author" content="">
+    <link rel="icon" href="/assets/favicon.ico">
+
+    <title>Executing Apache SAMOA with Apache S4</title>
+
+    <!-- Bootstrap core CSS -->
+    <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
+    <!-- Bootstrap theme -->
+    <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
+
+    <!-- Custom styles for this template -->
+    <link href="/assets/css/theme.css" rel="stylesheet">
+       
+       <link href="/css/main.css" rel="stylesheet">
+
+    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
+    <!--[if lt IE 9]><script 
src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
+    <script src="/assets/js/ie-emulation-modes-warning.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!--[if lt IE 9]>
+      <script 
src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js";></script>
+      <script 
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+  </head>
+
+
+
+  <body>
+    <div class="container">
+        <!-- Fixed navbar -->
+    <nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+      <div class="container">
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#navbar" aria-expanded="false" 
aria-controls="navbar">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a class="navbar-brand" href="/index.html">Apache SAMOA</a>
+        </div>
+        <div id="navbar" class="navbar-collapse collapse">
+          <ul class="nav navbar-nav">
+            <li><a href="/index.html">Home</a></li>    
+                       <li><a 
href="/documentation/Home.html">Documentation</a></li>
+                       <li><a 
href="/documentation/Team.html">Contributors</a></li>
+                       <li><a href="/documentation/Bylaws.html">Bylaws</a></li>
+          </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+
+
+
+
+      
+        <!-- Documentation -->
+<!-- <div class="container"> -->
+
+  <header class="post-header">
+    <h1 class="post-title">Executing Apache SAMOA with Apache S4</h1>
+    <p class="post-meta"></p>
+  </header>
+
+  <article class="post-content">
+    <p>In this tutorial page we describe how to execute SAMOA on top of Apache 
S4.</p>
+
+<h2 id="prerequisites">Prerequisites</h2>
+
+<p>The following dependencies are needed to run SAMOA smoothly on Apache S4</p>
+
+<ul>
+<li><a href="http://www.gradle.org/";>Gradle</a></li>
+<li><a href="https://incubator.apache.org/s4/";>Apache S4</a></li>
+</ul>
+
+<h2 id="gradle">Gradle</h2>
+
+<p>Gradle is a build automation tool and is used to build Apache S4. The 
installation guide can be found <a 
href="http://www.gradle.org/docs/current/userguide/installation.html";>here.</a> 
The following instructions is a simplified installation guide.</p>
+
+<ol>
+<li>Download Gradle binaries from <a 
href="http://services.gradle.org/distributions/gradle-1.6-bin.zip";>downloads</a>,
 or from the console type <code>wget 
http://services.gradle.org/distributions/gradle-1.6-bin.zip</code></li>
+<li>Unzip the file <code>unzip gradle-1.6-bin.zip</code></li>
+<li>Set the Gradle environment variable: <code>export 
GRADLE_HOME=/foo/bar/gradle-1.6</code></li>
+<li>Add to the systems path <code>export 
PATH=$PATH:$GRADLE_HOME/bin</code></li>
+<li>Install Gradle by running <code>gradle</code></li>
+</ol>
+
+<p>Now you are all set to install Apache S4</p>
+
+<h2 id="apache-s4">Apache S4</h2>
+
+<p>S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable 
platform that allows programmers to easily develop applications for processing 
continuous unbounded streams of data. The installation process is as 
follows:</p>
+
+<ol>
+<li>Download the latest Apache S4 release from <a 
href="http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip";>Apache
 S4 0.6.0</a> or from command line <code>wget 
http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip</code>
 or clone from git.
+<code>git clone 
https://git-wip-us.apache.org/repos/asf/incubator-s4.git</code>.</li>
+<li>Unzip the file <code>unzip apache-s4-0.6.0-incubating-src.zip</code> or go 
in the cloned directory.</li>
+<li>Set the Apache S4 environment variable <code>export 
S4_HOME=/foo/bar/apache-s4-0.6.0-incubating-src</code>.</li>
+<li>Add the S4_HOME to the system PATH. <code>export 
PATH=$PATH:$S4_HOME</code>.</li>
+<li>Once the previous steps are done we can proceed to build and install 
Apache S4.</li>
+<li>You can have a look at the available build tasks by typing <code>gradle 
tasks</code>.</li>
+<li>There are some dependencies issues, therefore you should run the wrapper 
task first by typing <code>gradle wrapper</code>.</li>
+<li>Install the artifacts for Apache S4 by running <code>gradle install</code> 
in the S4_HOME directory.</li>
+<li>Install the S4-TOOLS, <code>gradle s4-tools::installApp</code>.</li>
+</ol>
+
+<p>Done. Now you can configure and run your Apache S4 cluster.</p>
+
+<hr>
+
+<h2 id="building-samoa">Building SAMOA</h2>
+
+<p>Once the S4 dependencies are installed, you can simply clone the repository 
and install SAMOA.</p>
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">git 
clone http://git.apache.org/incubator-samoa.git
+<span class="nb">cd </span>incubator-samoa
+mvn -Ps4 package 
+</code></pre></div>
+<p>The deployable jars for SAMOA will be in 
<code>target/SAMOA-&lt;variant&gt;-&lt;version&gt;-SNAPSHOT.jar</code>. For 
example, in our case for S4 <code>target/SAMOA-S4-0.3.0-SNAPSHOT.jar</code>.</p>
+
+<hr>
+
+<h2 id="samoa-s4-configuration">SAMOA-S4 Configuration</h2>
+
+<p>This section will go through the <code>bin/samoa-s4.properties</code> file 
and how to configure it.
+In order for SAMOA to run correctly in a distributed environment there are 
some variables that need to be defined. Since Apache S4 uses <a 
href="https://zookeeper.apache.org/";>ZooKeeper</a> for cluster management we 
need to define where it is running.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"># 
Zookeeper Server
+zookeeper.server=localhost
+zookeeper.port=2181
+</code></pre></div>
+<p>Apache S4 also distributes the application via HTTP, therefore the server 
and port which contains the S4 application must be provided.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"># 
Simple HTTP Server providing the packaged S4 jar
+http.server.ip=localhost
+http.server.port=8000
+</code></pre></div>
+<p>Apache S4 uses the concept of logical clusters to define a group of 
machines, which are identified by an ID and start serving on a specific 
port.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"># 
Name of the S4 cluster
+cluster.name=cluster
+cluster.port=12000
+</code></pre></div>
+<p>SAMOA can be deployed on a single machine using only one resource or in a 
cluster environments. The following property can be defined to deploy as a 
<code>local</code> application or on a <code>cluster</code>.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"># 
Deployment strategy
+samoa.deploy.mode=local
+</code></pre></div>
+<hr>
+
+<h2 id="samoa-s4-deployment">SAMOA S4 Deployment</h2>
+
+<p>In order to deploy SAMOA in a distributed environment you 
<strong>MUST</strong> configure the <code>bin/samoa-s4.properties</code> file 
correctly. If you are running locally it is optional to modify the properties 
file.</p>
+
+<p>The deployment is done by running the SAMOA execution script 
<code>bin/samoa</code> with some additional parameters.
+The execution syntax is as follows:
+<code>bin/samoa &lt;platform&gt; &lt;jar-location&gt; &lt;task &amp; 
options&gt;</code></p>
+
+<p>Example:</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">bin/samoa S4 target/SAMOA-S4-0.0.1-SNAPSHOT.jar 
&quot;ClusteringEvaluation&quot;
+</code></pre></div>
+<p>The &lt;platform&gt; can be s4 or storm.</p>
+
+<p>The &lt;jar-location&gt; must be the absolute path to the platform specific 
jar file.</p>
+
+<p>The &lt;task &amp; options&gt; should be the name of a known task and the 
options belonging to that task.</p>
+
+  </article>
+
+<!-- </div> -->
+
+      
+
+    <hr/>
+<div id="footer" class="container text-center">
+       
+            <p class="text-muted credit"><p>
+Copyright © 2014 <a href="http://www.apache.org";>Apache Software 
Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache 
feather logo  are trademarks of The Apache Software Foundation. All other marks 
mentioned may be trademarks or registered trademarks of their respective 
owners.</p>
+
+</div>
+
+    <!-- Bootstrap core JavaScript
+    ================================================== -->
+    <!-- Placed at the end of the document so the pages load faster -->
+    <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js";></script>
+    <script src="/assets/js/bootstrap.min.js"></script>
+    <script src="/assets/js/docs.min.js"></script>
+    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
+    <script src="/assets/js/ie10-viewport-bug-workaround.js"></script>
+
+    </div>
+       
+  </body>
+
+</html>

Added: incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html?rev=1661475&view=auto
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html 
(added)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html 
Sun Feb 22 13:41:20 2015
@@ -0,0 +1,322 @@
+<!DOCTYPE html>
+<html>
+
+    <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="description" content="">
+    <meta name="author" content="">
+    <link rel="icon" href="/assets/favicon.ico">
+
+    <title>Executing Apache SAMOA with Apache Samza</title>
+
+    <!-- Bootstrap core CSS -->
+    <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
+    <!-- Bootstrap theme -->
+    <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
+
+    <!-- Custom styles for this template -->
+    <link href="/assets/css/theme.css" rel="stylesheet">
+       
+       <link href="/css/main.css" rel="stylesheet">
+
+    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
+    <!--[if lt IE 9]><script 
src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
+    <script src="/assets/js/ie-emulation-modes-warning.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!--[if lt IE 9]>
+      <script 
src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js";></script>
+      <script 
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+  </head>
+
+
+
+  <body>
+    <div class="container">
+        <!-- Fixed navbar -->
+    <nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+      <div class="container">
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#navbar" aria-expanded="false" 
aria-controls="navbar">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a class="navbar-brand" href="/index.html">Apache SAMOA</a>
+        </div>
+        <div id="navbar" class="navbar-collapse collapse">
+          <ul class="nav navbar-nav">
+            <li><a href="/index.html">Home</a></li>    
+                       <li><a 
href="/documentation/Home.html">Documentation</a></li>
+                       <li><a 
href="/documentation/Team.html">Contributors</a></li>
+                       <li><a href="/documentation/Bylaws.html">Bylaws</a></li>
+          </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+
+
+
+
+      
+        <!-- Documentation -->
+<!-- <div class="container"> -->
+
+  <header class="post-header">
+    <h1 class="post-title">Executing Apache SAMOA with Apache Samza</h1>
+    <p class="post-meta"></p>
+  </header>
+
+  <article class="post-content">
+    <p>This tutorial describes how to run SAMOA on Apache Samza.
+The steps included in this tutorial are:</p>
+
+<ol>
+<li><p>Setup and configure a cluster with the required dependencies. This 
applies for single-node (local) execution as well.</p></li>
+<li><p>Build SAMOA deployables</p></li>
+<li><p>Configure SAMOA-Samza</p></li>
+<li><p>Deploy SAMOA-Samza and execute a task</p></li>
+<li><p>Observe the execution and the result</p></li>
+</ol>
+
+<h2 id="setup-cluster">Setup cluster</h2>
+
+<p>The following are needed to to run SAMOA on top of Samza:</p>
+
+<ul>
+<li><a href="http://zookeeper.apache.org/";>Apache Zookeeper</a></li>
+<li><a href="http://kafka.apache.org/";>Apache Kafka</a></li>
+<li><a 
href="http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html";>Apache
 Hadoop YARN and HDFS</a></li>
+</ul>
+
+<h3 id="zookeeper">Zookeeper</h3>
+
+<p>Zookeeper is used by Kafka to coordinate its brokers. The detail 
instructions to setup a Zookeeper cluster can be found <a 
href="http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html";>here</a>. 
</p>
+
+<p>To quickly setup a single-node Zookeeper cluster:</p>
+
+<ol>
+<li><p>Download the binary release from the <a 
href="http://zookeeper.apache.org/releases.html";>release page</a>.</p></li>
+<li><p>Untar the archive</p></li>
+</ol>
+<div class="highlight"><pre><code class="language-text" data-lang="text">tar 
-xf $DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/
+</code></pre></div>
+<ol>
+<li>Copy the default configuration file</li>
+</ol>
+<div class="highlight"><pre><code class="language-text" data-lang="text">cp 
zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg
+</code></pre></div>
+<ol>
+<li>Start the single-node cluster</li>
+</ol>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">~/zookeeper-3.4.6/bin/zkServer.sh start
+</code></pre></div>
+<h3 id="kafka">Kafka</h3>
+
+<p>Kafka is a distributed, partitioned, replicated commit log service which 
Samza uses as its default messaging system. </p>
+
+<ol>
+<li><p>Download a binary release of Kafka <a 
href="http://kafka.apache.org/downloads.html";>here</a>. As mentioned in the 
page, the Scala version does not matter. However, 2.10 is recommended as Samza 
has recently been moved to Scala 2.10.</p></li>
+<li><p>Untar the archive </p></li>
+</ol>
+<div class="highlight"><pre><code class="language-text" data-lang="text">tar 
-xzf $DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/
+</code></pre></div>
+<p>If you are running in local mode or a single-node cluster, you can now 
start Kafka with the command:</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">~/kafka_2.10-0.8.1/bin/kafka-server-start.sh 
kafka_2.10-0.8.1/config/server.properties
+</code></pre></div>
+<p>In multi-node cluster, it is typical and convenient to have a Kafka broker 
on each node (although you can totally have a smaller Kafka cluster, or even a 
single-node Kafka cluster). The number of brokers in Kafka cluster will affect 
disk bandwidth and space (the more brokers we have, the higher value we will 
get for the two). In each node, you need to set the following properties in 
<code>~/kafka_2.10-0.8.1/config/server.properties</code> before starting Kafka 
service.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">broker.id=a-unique-number-for-each-node
+zookeeper.connect=zookeeper-host0-url:2181[,zookeeper-host1-url:2181,...]
+</code></pre></div>
+<p>You might want to change the retention hours or retention bytes of the logs 
to avoid the logs size from growing too big.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">log.retention.hours=number-of-hours-to-keep-the-logs
+log.retention.bytes=number-of-bytes-to-keep-in-the-logs
+</code></pre></div>
+<h3 id="hadoop-yarn-and-hdfs">Hadoop YARN and HDFS</h3>
+
+<blockquote>
+<p>Hadoop YARN and HDFS are <strong>not</strong> required to run SAMOA in 
Samza local mode. </p>
+</blockquote>
+
+<p>To set up a YARN cluster, first download a binary release of Hadoop <a 
href="http://www.apache.org/dyn/closer.cgi/hadoop/common/";>here</a> on each 
node in the cluster and untar the archive
+<code>tar -xf $DOWNLOAD_DIR/hadoop-2.2.0.tar.gz -C ~/</code>. We have tested 
SAMOA with Hadoop 2.2.0 but Hadoop 2.3.0 should work too.</p>
+
+<p><strong>HDFS</strong></p>
+
+<p>Set the following properties in 
<code>~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml</code> in all nodes.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">&lt;configuration&gt;
+  &lt;property&gt;
+    &lt;name&gt;dfs.datanode.data.dir&lt;/name&gt;
+    &lt;value&gt;file:///home/username/hadoop-2.2.0/hdfs/datanode&lt;/value&gt;
+    &lt;description&gt;Comma separated list of paths on the local filesystem 
of a DataNode where it should store its blocks.&lt;/description&gt;
+  &lt;/property&gt;
+
+  &lt;property&gt;
+    &lt;name&gt;dfs.namenode.name.dir&lt;/name&gt;
+    &lt;value&gt;file:///home/username/hadoop-2.2.0/hdfs/namenode&lt;/value&gt;
+    &lt;description&gt;Path on the local filesystem where the NameNode stores 
the namespace and transaction logs persistently.&lt;/description&gt;
+  &lt;/property&gt;
+&lt;/configuration&gt;
+</code></pre></div>
+<p>Add this property in <code>~/hadoop-2.2.0/etc/hadoop/core-site.xml</code> 
in all nodes.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">&lt;configuration&gt;
+  &lt;property&gt;
+    &lt;name&gt;fs.defaultFS&lt;/name&gt;
+    &lt;value&gt;hdfs://localhost:9000/&lt;/value&gt;
+    &lt;description&gt;NameNode URI&lt;/description&gt;
+  &lt;/property&gt;
+
+  &lt;property&gt;
+    &lt;name&gt;fs.hdfs.impl&lt;/name&gt;
+    &lt;value&gt;org.apache.hadoop.hdfs.DistributedFileSystem&lt;/value&gt;
+  &lt;/property&gt;
+&lt;/configuration&gt;
+</code></pre></div>
+<p>For a multi-node cluster, change the hostname (&quot;localhost&quot;) to 
the correct host name of your namenode server.</p>
+
+<p>Format HDFS directory (only perform this if you are running it for the very 
first time)</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">~/hadoop-2.2.0/bin/hdfs namenode -format
+</code></pre></div>
+<p>Start namenode daemon on one of the node</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode
+</code></pre></div>
+<p>Start datanode daemon on all nodes</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode
+</code></pre></div>
+<p><strong>YARN</strong></p>
+
+<p>If you are running in multi-node cluster, set the resource manager hostname 
in <code>~/hadoop-2.2.0/etc/hadoop/yarn-site.xml</code> in all nodes as 
follow:</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">&lt;configuration&gt;
+  &lt;property&gt;
+    &lt;name&gt;yarn.resourcemanager.hostname&lt;/name&gt;
+    &lt;value&gt;resourcemanager-url&lt;/value&gt;
+    &lt;description&gt;The hostname of the RM.&lt;/description&gt;
+  &lt;/property&gt;
+&lt;/configuration&gt;
+</code></pre></div>
+<p><strong>Other configurations</strong>
+Now we need to tell Samza where to find the configuration of YARN cluster. To 
do this, first create a new directory in all nodes:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">mkdir 
~/.samza
+mkdir ~/.samza/conf
+</code></pre></div>
+<p>Copy (or soft link) <code>core-site.xml</code>, <code>hdfs-site.xml</code>, 
<code>yarn-site.xml</code> in <code>~/hadoop-2.2.0/etc/hadoop</code> to the new 
directory </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">ln -s 
~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml
+ln -s ~/.samza/conf/hdfs-site.xml ~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
+ln -s ~/.samza/conf/yarn-site.xml ~/hadoop-2.2.0/etc/hadoop/yarn-site.xml
+</code></pre></div>
+<p>Export the enviroment variable YARN_HOME (in ~/.bashrc) so Samza knows 
where to find these YARN configuration files.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">export YARN_HOME=$HOME/.samza
+</code></pre></div>
+<p><strong>Start the YARN cluster</strong>
+Start resource manager on master node</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager
+</code></pre></div>
+<p>Start node manager on all worker nodes</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager
+</code></pre></div>
+<h2 id="build-samoa">Build SAMOA</h2>
+
+<p>Perform the following step on one of the node in the cluster. Here we 
assume git and maven are installed on this node.</p>
+
+<p>Since Samza is not yet released on Maven, we will have to clone Samza 
project, build and publish to Maven local repository:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">git 
clone -b 0.7.0 https://github.com/apache/incubator-samza.git
+cd incubator-samza
+./gradlew clean build
+./gradlew publishToMavenLocal
+</code></pre></div>
+<p>Here we cloned and installed Samza version 0.7.0, the current released 
version (July 2014). </p>
+
+<p>Now we can clone the repository and install SAMOA.</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">git 
clone http://git.apache.org/incubator-samoa.git
+cd incubator-samoa
+mvn -Psamza package
+</code></pre></div>
+<p>The deployable jars for SAMOA will be in 
<code>target/SAMOA-&lt;variant&gt;-&lt;version&gt;-SNAPSHOT.jar</code>. For 
example, in our case for Samza 
<code>target/SAMOA-Samza-0.2.0-SNAPSHOT.jar</code>.</p>
+
+<h2 id="configure-samoa-samza-execution">Configure SAMOA-Samza execution</h2>
+
+<p>This section explains the configuration parameters in 
<code>bin/samoa-samza.properties</code> that are required to run SAMOA on top 
of Samza.</p>
+
+<p><strong>Samza execution mode</strong></p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">samoa.samza.mode=[yarn|local]
+</code></pre></div>
+<p>This parameter specify which mode to execute the task: <code>local</code> 
for local execution and <code>yarn</code> for cluster execution.</p>
+
+<p><strong>Zookeeper</strong></p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">zookeeper.connect=localhost
+zookeeper.port=2181
+</code></pre></div>
+<p>The default setting above applies for local mode execution. For cluster 
mode, change <code>zookeeper.host</code> to the correct URL of your zookeeper 
host.</p>
+
+<p><strong>Kafka</strong></p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">kafka.broker.list=localhost:9092
+</code></pre></div>
+<p><code>kafka.broker.list</code> is a comma separated list of host:port of 
all the brokers in Kafka cluster.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">kafka.replication.factor=1
+</code></pre></div>
+<p><code>kafka.replication.factor</code> specifies the number of replicas for 
each stream in Kafka. This number must be less than or equal to the number of 
brokers in Kafka cluster.</p>
+
+<p><strong>YARN</strong></p>
+
+<blockquote>
+<p>The below settings do not apply for local mode execution, you can leave 
them as they are.</p>
+</blockquote>
+
+<p><code>yarn.am.memory</code> and <code>yarn.container.memory</code> specify 
the memory requirement for the Application Master container and the worker 
containers, respectively. </p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">yarn.am.memory=1024
+yarn.container.memory=1024
+</code></pre></div>
+<p><code>yarn.package.path</code> specifies the path (typically a HDFS path) 
of the package to be distributed to all YARN containers to execute the task.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar
+</code></pre></div>
+<p><strong>Samza</strong>
+<code>max.pi.per.container</code> specifies the number of PI instances allowed 
in one YARN container. </p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">max.pi.per.container=1
+</code></pre></div>
+<p><code>kryo.register.file</code> specifies the registration file for Kryo 
serializer.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">kryo.register.file=samza-kryo
+</code></pre></div>
+<p><code>checkpoint.commit.ms</code> specifies the frequency for PIs to commit 
their checkpoints (in ms). The default value is 1 minute.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">checkpoint.commit.ms=60000
+</code></pre></div>
+<h2 id="deploy-samoa-samza-task">Deploy SAMOA-Samza task</h2>
+
+<p>Execute SAMOA task with the following command:</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">bin/samoa samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar 
&quot;&lt;task&gt; &amp; &lt;options&gt;&quot; 
+</code></pre></div>
+<h2 id="observe-execution-and-result">Observe execution and result</h2>
+
+<p>In local mode, all the log will be printed out to stdout. If you execute 
the task on YARN cluster, the output is written to stdout files in YARN&#39;s 
containers&#39; log folder 
($HADOOP_HOME/logs/userlogs/application_&lt;application-id&gt;/container_&lt;container-id&gt;).</p>
+
+  </article>
+
+<!-- </div> -->
+
+      
+
+    <hr/>
+<div id="footer" class="container text-center">
+       
+            <p class="text-muted credit"><p>
+Copyright © 2014 <a href="http://www.apache.org";>Apache Software 
Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache 
feather logo  are trademarks of The Apache Software Foundation. All other marks 
mentioned may be trademarks or registered trademarks of their respective 
owners.</p>
+
+</div>
+
+    <!-- Bootstrap core JavaScript
+    ================================================== -->
+    <!-- Placed at the end of the document so the pages load faster -->
+    <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js";></script>
+    <script src="/assets/js/bootstrap.min.js"></script>
+    <script src="/assets/js/docs.min.js"></script>
+    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
+    <script src="/assets/js/ie10-viewport-bug-workaround.js"></script>
+
+    </div>
+       
+  </body>
+
+</html>

Added: incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html?rev=1661475&view=auto
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html 
(added)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html 
Sun Feb 22 13:41:20 2015
@@ -0,0 +1,203 @@
+<!DOCTYPE html>
+<html>
+
+    <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="description" content="">
+    <meta name="author" content="">
+    <link rel="icon" href="/assets/favicon.ico">
+
+    <title>Executing Apache SAMOA with Apache Storm</title>
+
+    <!-- Bootstrap core CSS -->
+    <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
+    <!-- Bootstrap theme -->
+    <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
+
+    <!-- Custom styles for this template -->
+    <link href="/assets/css/theme.css" rel="stylesheet">
+       
+       <link href="/css/main.css" rel="stylesheet">
+
+    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
+    <!--[if lt IE 9]><script 
src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
+    <script src="/assets/js/ie-emulation-modes-warning.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!--[if lt IE 9]>
+      <script 
src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js";></script>
+      <script 
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+  </head>
+
+
+
+  <body>
+    <div class="container">
+        <!-- Fixed navbar -->
+    <nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+      <div class="container">
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#navbar" aria-expanded="false" 
aria-controls="navbar">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a class="navbar-brand" href="/index.html">Apache SAMOA</a>
+        </div>
+        <div id="navbar" class="navbar-collapse collapse">
+          <ul class="nav navbar-nav">
+            <li><a href="/index.html">Home</a></li>    
+                       <li><a 
href="/documentation/Home.html">Documentation</a></li>
+                       <li><a 
href="/documentation/Team.html">Contributors</a></li>
+                       <li><a href="/documentation/Bylaws.html">Bylaws</a></li>
+          </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+
+
+
+
+      
+        <!-- Documentation -->
+<!-- <div class="container"> -->
+
+  <header class="post-header">
+    <h1 class="post-title">Executing Apache SAMOA with Apache Storm</h1>
+    <p class="post-meta"></p>
+  </header>
+
+  <article class="post-content">
+    <p>In this tutorial page we describe how to execute SAMOA on top of Apache 
Storm. Here is an outline of what we want to do:</p>
+
+<ol>
+<li>Ensure that you have necessary Storm cluster and configuration to execute 
SAMOA</li>
+<li>Ensure that you have all the SAMOA deployables for execution in the 
cluster</li>
+<li>Configure samoa-storm.properties</li>
+<li>Execute SAMOA classification task</li>
+<li>Observe the task execution</li>
+</ol>
+
+<h3 id="storm-configuration">Storm Configuration</h3>
+
+<p>Before we start the tutorial, please ensure that you already have Storm 
cluster (preferably Storm 0.8.2) running. You can follow this <a 
href="http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/";>tutorial</a>
 to set up a Storm cluster.</p>
+
+<p>You also need to install Storm at the machine where you initiate the 
deployment, and configure Storm (at least) with this configuration in 
<code>~/.storm/storm.yaml</code>:</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">########### These MUST be filled in for a storm configuration
+nimbus.host: &quot;&lt;enter your nimbus host name here&gt;&quot;
+
+## List of custom serializations
+kryo.register:
+    - com.yahoo.labs.samoa.learners.classifiers.trees.AttributeContentEvent: 
com.yahoo.labs.samoa.learners.classifiers.trees.AttributeContentEvent$AttributeCEFullPrecSerializer
+    - com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent: 
com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer
+</code></pre></div>
+<!--
+Or, if you are using SAMOA with optimized VHT, you should use this following 
configuration file:
+```
+########### These MUST be filled in for a storm configuration
+nimbus.host: "<enter your nimbus host name here>"
+
+## List of custom serializations
+kryo.register:
+     - 
com.yahoo.labs.samoa.learners.classifiers.trees.NaiveAttributeContentEvent: 
com.yahoo.labs.samoa.classifiers.trees.NaiveAttributeContentEvent$NaiveAttributeCEFullPrecSerializer
+     - com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent: 
com.yahoo.labs.samoa.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer
+```
+-->
+
+<p>Alternatively, if you don&#39;t have Storm cluster running, you can execute 
SAMOA with Storm in local mode as explained in section <a 
href="#samoa-storm-properties">samoa-storm.properties Configuration</a>.</p>
+
+<h3 id="samoa-deployables">SAMOA deployables</h3>
+
+<p>There are three deployables for executing SAMOA on top of Storm. They 
are:</p>
+
+<ol>
+<li><code>bin/samoa</code> is the main script to execute SAMOA. You do not 
need to change anything in this script.</li>
+<li><code>target/SAMOA-Storm-x.x.x-SNAPSHOT.jar</code> is the deployed jar 
file. <code>x.x.x</code> is the version number of SAMOA. </li>
+<li><code>bin/samoa-storm.properties</code> contains deployment 
configurations. You need to set the parameters in this properties file 
correctly. </li>
+</ol>
+
+<h3 id="-samoa-storm.properties-configuration"><a 
name="samoa-storm-properties"> samoa-storm.properties Configuration</a></h3>
+
+<p>Currently, the properties file contains two configurations:</p>
+
+<ol>
+<li><code>samoa.storm.mode</code> determines whether the task is executed 
locally (using Storm&#39;s <code>LocalCluster</code>) or executed in a Storm 
cluster. Use <code>local</code> if you want to test SAMOA and you do not have a 
Storm cluster for deployment. Use <code>cluster</code> if you want to test 
SAMOA on your Storm cluster.</li>
+<li><code>samoa.storm.numworker</code> determines the number of worker to 
execute the SAMOA tasks in the Storm cluster. This field must be an integer, 
less than or equal to the number of available slots in you Storm cluster. If 
you are using local mode, this property corresponds to the number of thread 
used by Storm&#39;s LocalCluster to execute your SAMOA task.</li>
+</ol>
+
+<p>Here is the example of a complete properties file:</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text"># 
SAMOA Storm properties file
+# This file contains specific configurations for SAMOA deployment in the Storm 
platform
+# Note that you still need to configure Storm client in your machine, 
+# including setting up Storm configuration file (~/.storm/storm.yaml) with 
correct settings
+
+# samoa.storm.mode corresponds to the execution mode of the Task in Storm 
+# possible values:
+#   1. cluster: the Task will be sent into nimbus. The nimbus is configured by 
Storm configuration file
+#   2. local: the Task will be sent using local Storm cluster
+samoa.storm.mode=cluster
+
+# samoa.storm.numworker corresponds to the number of worker processes 
allocated in Storm cluster
+# possible values: any integer greater than 0  
+samoa.storm.numworker=7
+</code></pre></div>
+<h3 id="samoa-task-execution">SAMOA task execution</h3>
+
+<p>You can execute a SAMOA task using the aforementioned 
<code>bin/samoa</code> script with this following format:
+<code>bin/samoa &lt;platform&gt; &lt;jar&gt; 
&quot;&lt;task&gt;&quot;</code>.</p>
+
+<p><code>&lt;platform&gt;</code> can be <code>storm</code> or <code>s4</code>. 
Using <code>storm</code> option means you are deploying SAMOA on a Storm 
environment. In this configuration, the script uses the aforementioned yaml 
file (<code>~/.storm/storm.yaml</code>) and <code>samoa-storm.properties</code> 
to perform the deployment. Using <code>s4</code> option means you are deploying 
SAMOA on an Apache S4 environment. Follow this <a 
href="Executing-SAMOA-with-Apache-S4">link</a> to learn more about deploying 
SAMOA on Apache S4.</p>
+
+<p><code>&lt;jar&gt;</code> is the location of the deployed jar file 
(<code>SAMOA-Storm-x.x.x-SNAPSHOT.jar</code>) in your file system. The location 
can be a relative path or an absolute path into the jar file. </p>
+
+<p><code>&quot;&lt;task&gt;&quot;</code> is the SAMOA task command line such 
as <code>PrequentialEvaluation</code> or <code>ClusteringTask</code>. This 
command line for SAMOA task follows the format of <a 
href="http://moa.cms.waikato.ac.nz/details/classification/command-line/";>Massive
 Online Analysis (MOA)</a>.</p>
+
+<p>The complete command to execute SAMOA is:</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar 
&quot;PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l 
(com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s 
(com.yahoo.labs.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u 
10)&quot;
+</code></pre></div>
+<p>The example above uses <a href="Prequential-Evaluation-Task">Prequential 
Evaluation task</a> and <a href="Vertical-Hoeffding-Tree-Classifier">Vertical 
Hoeffding Tree</a> classifier. </p>
+
+<h3 id="observing-task-execution">Observing task execution</h3>
+
+<p>There are two ways to observe the task execution using Storm UI and by 
monitoring the dump file of the SAMOA task. Notice that the dump file will be 
created on the cluster if you are executing your task in <code>cluster</code> 
mode.</p>
+
+<h4 id="using-storm-ui">Using Storm UI</h4>
+
+<p>Go to the web address of Storm UI and check whether the SAMOA task executes 
as intended. Use this UI to kill the associated Storm topology if necessary.</p>
+
+<h4 id="monitoring-the-dump-file">Monitoring the dump file</h4>
+
+<p>Several tasks have options to specify a dump file, which is a file that 
represents the task output. In our example, <a 
href="Prequential-Evaluation-Task">Prequential Evaluation task</a> has 
<code>-d</code> option which specifies the path to the dump file. Since Storm 
performs the allocation of Storm tasks, you should set the dump file into a 
file on a shared filesystem if you want to access it from the machine 
submitting the task.</p>
+
+  </article>
+
+<!-- </div> -->
+
+      
+
+    <hr/>
+<div id="footer" class="container text-center">
+       
+            <p class="text-muted credit"><p>
+Copyright © 2014 <a href="http://www.apache.org";>Apache Software 
Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache 
feather logo  are trademarks of The Apache Software Foundation. All other marks 
mentioned may be trademarks or registered trademarks of their respective 
owners.</p>
+
+</div>
+
+    <!-- Bootstrap core JavaScript
+    ================================================== -->
+    <!-- Placed at the end of the document so the pages load faster -->
+    <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js";></script>
+    <script src="/assets/js/bootstrap.min.js"></script>
+    <script src="/assets/js/docs.min.js"></script>
+    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
+    <script src="/assets/js/ie10-viewport-bug-workaround.js"></script>
+
+    </div>
+       
+  </body>
+
+</html>

Added: incubator/samoa/site/documentation/Getting-Started.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Getting-Started.html?rev=1661475&view=auto
==============================================================================
--- incubator/samoa/site/documentation/Getting-Started.html (added)
+++ incubator/samoa/site/documentation/Getting-Started.html Sun Feb 22 13:41:20 
2015
@@ -0,0 +1,127 @@
+<!DOCTYPE html>
+<html>
+
+    <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="description" content="">
+    <meta name="author" content="">
+    <link rel="icon" href="/assets/favicon.ico">
+
+    <title>Getting Started</title>
+
+    <!-- Bootstrap core CSS -->
+    <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
+    <!-- Bootstrap theme -->
+    <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
+
+    <!-- Custom styles for this template -->
+    <link href="/assets/css/theme.css" rel="stylesheet">
+       
+       <link href="/css/main.css" rel="stylesheet">
+
+    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
+    <!--[if lt IE 9]><script 
src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
+    <script src="/assets/js/ie-emulation-modes-warning.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!--[if lt IE 9]>
+      <script 
src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js";></script>
+      <script 
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+  </head>
+
+
+
+  <body>
+    <div class="container">
+        <!-- Fixed navbar -->
+    <nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+      <div class="container">
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#navbar" aria-expanded="false" 
aria-controls="navbar">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a class="navbar-brand" href="/index.html">Apache SAMOA</a>
+        </div>
+        <div id="navbar" class="navbar-collapse collapse">
+          <ul class="nav navbar-nav">
+            <li><a href="/index.html">Home</a></li>    
+                       <li><a 
href="/documentation/Home.html">Documentation</a></li>
+                       <li><a 
href="/documentation/Team.html">Contributors</a></li>
+                       <li><a href="/documentation/Bylaws.html">Bylaws</a></li>
+          </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+
+
+
+
+      
+        <!-- Documentation -->
+<!-- <div class="container"> -->
+
+  <header class="post-header">
+    <h1 class="post-title">Getting Started</h1>
+    <p class="post-meta"></p>
+  </header>
+
+  <article class="post-content">
+    <p>We start showing how simple is to run a first large scale machine 
learning task in SAMOA. We will evaluate a bagging ensemble method using 
decision trees on the Forest Covertype dataset.</p>
+
+<ul>
+<li>1. Download SAMOA </li>
+</ul>
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">git 
clone http://git.apache.org/incubator-samoa.git
+<span class="nb">cd </span>incubator-samoa
+mvn package      <span class="c">#Local mode</span>
+</code></pre></div>
+<ul>
+<li>2. Download the Forest CoverType dataset </li>
+</ul>
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">wget 
<span 
class="s2">&quot;http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip&quot;</span>
+unzip covtypeNorm.arff.zip 
+</code></pre></div>
+<p><em>Forest Covertype</em> contains the forest cover type for 30 x 30 meter 
cells obtained from the US Forest Service (USFS) Region 2 Resource Information 
System (RIS) data. It contains 581,012 instances and 54 attributes, and it has 
been used in several articles on data stream classification.</p>
+
+<ul>
+<li>3.  Run an example: classifying the CoverType dataset with the bagging 
algorithm</li>
+</ul>
+<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">bin/samoa <span class="nb">local 
</span>target/SAMOA-Local-0.3.0-SNAPSHOT.jar <span 
class="s2">&quot;PrequentialEvaluation -l classifiers.ensemble.Bagging </span>
+<span class="s2">    -s (ArffFileStream -f covtypeNorm.arff) -f 
100000&quot;</span>
+</code></pre></div>
+<p>The output will be a list of the evaluation results, plotted each 100,000 
instances.</p>
+
+  </article>
+
+<!-- </div> -->
+
+      
+
+    <hr/>
+<div id="footer" class="container text-center">
+       
+            <p class="text-muted credit"><p>
+Copyright © 2014 <a href="http://www.apache.org";>Apache Software 
Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache 
feather logo  are trademarks of The Apache Software Foundation. All other marks 
mentioned may be trademarks or registered trademarks of their respective 
owners.</p>
+
+</div>
+
+    <!-- Bootstrap core JavaScript
+    ================================================== -->
+    <!-- Placed at the end of the document so the pages load faster -->
+    <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js";></script>
+    <script src="/assets/js/bootstrap.min.js"></script>
+    <script src="/assets/js/docs.min.js"></script>
+    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
+    <script src="/assets/js/ie10-viewport-bug-workaround.js"></script>
+
+    </div>
+       
+  </body>
+
+</html>

Added: incubator/samoa/site/documentation/Home.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Home.html?rev=1661475&view=auto
==============================================================================
--- incubator/samoa/site/documentation/Home.html (added)
+++ incubator/samoa/site/documentation/Home.html Sun Feb 22 13:41:20 2015
@@ -0,0 +1,169 @@
+<!DOCTYPE html>
+<html>
+
+    <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="description" content="">
+    <meta name="author" content="">
+    <link rel="icon" href="/assets/favicon.ico">
+
+    <title>Apache SAMOA Documentation</title>
+
+    <!-- Bootstrap core CSS -->
+    <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
+    <!-- Bootstrap theme -->
+    <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
+
+    <!-- Custom styles for this template -->
+    <link href="/assets/css/theme.css" rel="stylesheet">
+       
+       <link href="/css/main.css" rel="stylesheet">
+
+    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
+    <!--[if lt IE 9]><script 
src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
+    <script src="/assets/js/ie-emulation-modes-warning.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!--[if lt IE 9]>
+      <script 
src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js";></script>
+      <script 
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+  </head>
+
+
+
+  <body>
+    <div class="container">
+        <!-- Fixed navbar -->
+    <nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+      <div class="container">
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#navbar" aria-expanded="false" 
aria-controls="navbar">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a class="navbar-brand" href="/index.html">Apache SAMOA</a>
+        </div>
+        <div id="navbar" class="navbar-collapse collapse">
+          <ul class="nav navbar-nav">
+            <li><a href="/index.html">Home</a></li>    
+                       <li><a 
href="/documentation/Home.html">Documentation</a></li>
+                       <li><a 
href="/documentation/Team.html">Contributors</a></li>
+                       <li><a href="/documentation/Bylaws.html">Bylaws</a></li>
+          </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+
+
+
+
+      
+        <!-- Documentation -->
+<!-- <div class="container"> -->
+
+  <header class="post-header">
+    <h1 class="post-title">Apache SAMOA Documentation</h1>
+    <p class="post-meta"></p>
+  </header>
+
+  <article class="post-content">
+    <p>Apache SAMOA is a distributed realtime machine learning system, similar 
to Mahout, but specific designed for stream mining. Apache SAMOA is simple and 
fun to use!</p>
+
+<p>This documentation is intended to give an introduction on how to use Apache 
SAMOA in different ways. As a user you can run Apache SAMOA algorithms into 
several Stream Processing Engines: local mode, Apache Storm, S4 and Samza. As a 
developer you can create new algorithms only once and test them in all of these 
Stream Processing Engines.</p>
+
+<h2 id="getting-started">Getting Started</h2>
+
+<ul>
+<li><a href="Getting-Started.html">0 Hands-on with SAMOA: Getting 
Started!</a></li>
+</ul>
+
+<h2 id="users">Users</h2>
+
+<ul>
+<li><a href="Scalable-Advanced-Massive-Online-Analysis.html">1 Building and 
Executing SAMOA</a>
+
+<ul>
+<li><a href="Building-SAMOA.html">1.0 Building SAMOA</a></li>
+<li><a href="Executing-SAMOA-with-Apache-Storm.html">1.1 Executing SAMOA with 
Apache Storm</a></li>
+<li><a href="Executing-SAMOA-with-Apache-S4.html">1.2 Executing SAMOA with 
Apache S4</a></li>
+<li><a href="Executing-SAMOA-with-Apache-Samza.html">1.3 Executing SAMOA with 
Apache Samza</a></li>
+</ul></li>
+<li><a href="SAMOA-and-Machine-Learning.html">2 Machine Learning Methods in 
SAMOA</a>
+
+<ul>
+<li><a href="Prequential-Evaluation-Task.html">2.1 Prequential Evaluation 
Task</a></li>
+<li><a href="Vertical-Hoeffding-Tree-Classifier.html">2.2 Vertical Hoeffding 
Tree Classifier</a></li>
+<li><a href="Adaptive-Model-Rules-Regressor.html">2.3 Adaptive Model Rules 
Regressor</a></li>
+<li><a href="Bagging-and-Boosting.html">2.4 Bagging and Boosting</a></li>
+<li><a href="Distributed-Stream-Clustering.html">2.5 Distributed Stream 
Clustering</a></li>
+<li><a href="Distributed-Stream-Frequent-Itemset-Mining.html">2.6 Distributed 
Stream Frequent Itemset Mining</a></li>
+<li><a href="SAMOA-for-MOA-users.html">2.7 SAMOA for MOA users</a></li>
+</ul></li>
+</ul>
+
+<h2 id="developers">Developers</h2>
+
+<ul>
+<li><a href="SAMOA-Topology.html">3 Understanding SAMOA Topologies</a>
+
+<ul>
+<li><a href="Processor.html">3.1 Processor</a></li>
+<li><a href="Content-Event.html">3.2 Content Event</a></li>
+<li><a href="Stream.html">3.3 Stream</a></li>
+<li><a href="Task.html">3.4 Task</a></li>
+<li><a href="Topology-Builder.html">3.5 Topology Builder</a></li>
+<li><a href="Learner.html">3.6 Learner</a></li>
+<li><a href="Processing-Item.html">3.7 Processing Item</a></li>
+</ul></li>
+<li><a href="Developing-New-Tasks-in-SAMOA.html">4 Developing New Tasks in 
SAMOA</a></li>
+</ul>
+
+<h3 id="getting-help">Getting help</h3>
+
+<h4 id="apache-samoa-users">Apache SAMOA Users</h4>
+
+<p>Samoa users should send messages and subscribe to <a 
href="mailto:[email protected]";>[email protected]</a>.</p>
+
+<p>You can subscribe to this list by sending an email to <a 
href="mailto:[email protected]";>[email protected]</a>.
 Likewise, you can cancel a subscription by sending an email to <a 
href="mailto:[email protected]";>[email protected]</a>.</p>
+
+<h4 id="apache-samoa-developers">Apache SAMOA Developers</h4>
+
+<p>Storm developers should send messages and subscribe to <a 
href="mailto:[email protected]";>[email protected]</a>.</p>
+
+<p>You can subscribe to this list by sending an email to <a 
href="mailto:[email protected]";>[email protected]</a>.
 Likewise, you can cancel a subscription by sending an email to <a 
href="mailto:[email protected]";>[email protected]</a>.</p>
+
+<p><strong>NOTE:</strong> The google groups account <a 
href="mailto:[email protected]";>[email protected]</a> is 
now officially deprecated in favor of the Apache-hosted user/dev mailing 
lists.</p>
+
+  </article>
+
+<!-- </div> -->
+
+      
+
+    <hr/>
+<div id="footer" class="container text-center">
+       
+            <p class="text-muted credit"><p>
+Copyright © 2014 <a href="http://www.apache.org";>Apache Software 
Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache 
feather logo  are trademarks of The Apache Software Foundation. All other marks 
mentioned may be trademarks or registered trademarks of their respective 
owners.</p>
+
+</div>
+
+    <!-- Bootstrap core JavaScript
+    ================================================== -->
+    <!-- Placed at the end of the document so the pages load faster -->
+    <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js";></script>
+    <script src="/assets/js/bootstrap.min.js"></script>
+    <script src="/assets/js/docs.min.js"></script>
+    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
+    <script src="/assets/js/ie10-viewport-bug-workaround.js"></script>
+
+    </div>
+       
+  </body>
+
+</html>

Added: incubator/samoa/site/documentation/Learner.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Learner.html?rev=1661475&view=auto
==============================================================================
--- incubator/samoa/site/documentation/Learner.html (added)
+++ incubator/samoa/site/documentation/Learner.html Sun Feb 22 13:41:20 2015
@@ -0,0 +1,116 @@
+<!DOCTYPE html>
+<html>
+
+    <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="description" content="">
+    <meta name="author" content="">
+    <link rel="icon" href="/assets/favicon.ico">
+
+    <title>Learner</title>
+
+    <!-- Bootstrap core CSS -->
+    <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
+    <!-- Bootstrap theme -->
+    <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
+
+    <!-- Custom styles for this template -->
+    <link href="/assets/css/theme.css" rel="stylesheet">
+       
+       <link href="/css/main.css" rel="stylesheet">
+
+    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
+    <!--[if lt IE 9]><script 
src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
+    <script src="/assets/js/ie-emulation-modes-warning.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!--[if lt IE 9]>
+      <script 
src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js";></script>
+      <script 
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+  </head>
+
+
+
+  <body>
+    <div class="container">
+        <!-- Fixed navbar -->
+    <nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+      <div class="container">
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#navbar" aria-expanded="false" 
aria-controls="navbar">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a class="navbar-brand" href="/index.html">Apache SAMOA</a>
+        </div>
+        <div id="navbar" class="navbar-collapse collapse">
+          <ul class="nav navbar-nav">
+            <li><a href="/index.html">Home</a></li>    
+                       <li><a 
href="/documentation/Home.html">Documentation</a></li>
+                       <li><a 
href="/documentation/Team.html">Contributors</a></li>
+                       <li><a href="/documentation/Bylaws.html">Bylaws</a></li>
+          </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+
+
+
+
+      
+        <!-- Documentation -->
+<!-- <div class="container"> -->
+
+  <header class="post-header">
+    <h1 class="post-title">Learner</h1>
+    <p class="post-meta"></p>
+  </header>
+
+  <article class="post-content">
+    <p>Learners are implemented in SAMOA as sub-topologies.</p>
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">public interface Learner extends Serializable{
+
+    public void init(TopologyBuilder topologyBuilder, Instances dataset);
+
+    public Processor getInputProcessor();
+
+    public Stream getResultStream();
+}
+</code></pre></div>
+<p>When a <code>Task</code> object is initiated via <code>init()</code>, the 
method <code>init(...)</code> of <code>Learner</code> is called, and the 
topology is added to the global topology of the task.</p>
+
+<p>To create a new learner, it is only needed to add streams, processors and 
their connections to the topology in <code>init(...)</code>, specify what is 
the processor that will manage the input stream of the learner in 
<code>getInputProcessor()</code>, and finally, specify what is going to be the 
output stream of the learner with <code>getResultStream()</code>.</p>
+
+  </article>
+
+<!-- </div> -->
+
+      
+
+    <hr/>
+<div id="footer" class="container text-center">
+       
+            <p class="text-muted credit"><p>
+Copyright © 2014 <a href="http://www.apache.org";>Apache Software 
Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache 
feather logo  are trademarks of The Apache Software Foundation. All other marks 
mentioned may be trademarks or registered trademarks of their respective 
owners.</p>
+
+</div>
+
+    <!-- Bootstrap core JavaScript
+    ================================================== -->
+    <!-- Placed at the end of the document so the pages load faster -->
+    <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js";></script>
+    <script src="/assets/js/bootstrap.min.js"></script>
+    <script src="/assets/js/docs.min.js"></script>
+    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
+    <script src="/assets/js/ie10-viewport-bug-workaround.js"></script>
+
+    </div>
+       
+  </body>
+
+</html>


Reply via email to