Author: gdfm
Date: Sun Jan 31 12:31:17 2016
New Revision: 1727802
URL: http://svn.apache.org/viewvc?rev=1727802&view=rev
Log:
SAMOA-47: Avro documentation
Modified:
incubator/samoa/site/documentation/Adaptive-Model-Rules-Regressor.html
incubator/samoa/site/documentation/Building-SAMOA.html
incubator/samoa/site/documentation/Bylaws.html
incubator/samoa/site/documentation/Content-Event.html
incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html
incubator/samoa/site/documentation/Distributed-Stream-Clustering.html
incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html
incubator/samoa/site/documentation/Getting-Started.html
incubator/samoa/site/documentation/Home.html
incubator/samoa/site/documentation/Learner.html
incubator/samoa/site/documentation/Prequential-Evaluation-Task.html
incubator/samoa/site/documentation/Processing-Item.html
incubator/samoa/site/documentation/Processor.html
incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html
incubator/samoa/site/documentation/Stream.html
incubator/samoa/site/documentation/Task.html
incubator/samoa/site/documentation/Topology-Builder.html
incubator/samoa/site/documentation/Vertical-Hoeffding-Tree-Classifier.html
Modified: incubator/samoa/site/documentation/Adaptive-Model-Rules-Regressor.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Adaptive-Model-Rules-Regressor.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Adaptive-Model-Rules-Regressor.html
(original)
+++ incubator/samoa/site/documentation/Adaptive-Model-Rules-Regressor.html Sun
Jan 31 12:31:17 2016
@@ -89,7 +89,7 @@
<p>For each incoming instance from <em>Source PI</em>, <em>Model Aggregator
PI</em> appies the current rule set to compute the prediction. The instance is
also forwarded from <em>Model Aggregator PI</em> to the <em>Learner PI(s)</em>
to train those rules that cover this instance. If an instance is not covered by
any rule in the set, the default rule will be used for prediction and will also
be trained with this instance. When the default rule expands and create a new
rule, the new rule will be sent from <em>Model aggregator PI</em> to one of the
<em>Learner PIs</em>. When the <em>Learner PIs</em> expand or remove a rule, an
update message is also sent back to the <em>Model Aggregator PI</em>.</p>
<p>The number of <em>Learner PIs</em> can be set with the <code>-p</code>
option:</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">PrequentialEvaluationTask -l
(com.yahoo.labs.samoa.learners.classifiers.rules.VerticalAMRulesRegressor -p 4)
+<div class="highlight"><pre><code class="language-"
data-lang="">PrequentialEvaluationTask -l
(com.yahoo.labs.samoa.learners.classifiers.rules.VerticalAMRulesRegressor -p 4)
</code></pre></div>
<h3 id="horizontal-adaptive-model-rules-regressor">Horizontal Adaptive Model
Rules Regressor</h3>
@@ -103,7 +103,7 @@
<p>Newly created rules are sent from <em>Default Rule Learner PI</em> to all
<em>Model Aggregator PIs</em> and one of the <em>Learner PIs</em>. Update
messages are also sent from <em>Learner PIs</em> to all <em>Model Aggregator
PIs</em> when a rule is expanded or removed.</p>
<p>The number of <em>Learner PIs</em> can be set with the <code>-p</code>
option and the number of <em>Model Aggregator PIs</em> can be set with the
<code>-r</code> option:</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">PrequentialEvaluationTask -l
(com.yahoo.labs.samoa.learners.classifiers.rules.HorizontalAMRulesRegressor -r
4 -p 2)
+<div class="highlight"><pre><code class="language-"
data-lang="">PrequentialEvaluationTask -l
(com.yahoo.labs.samoa.learners.classifiers.rules.HorizontalAMRulesRegressor -r
4 -p 2)
</code></pre></div>
</article>
Modified: incubator/samoa/site/documentation/Building-SAMOA.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Building-SAMOA.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Building-SAMOA.html (original)
+++ incubator/samoa/site/documentation/Building-SAMOA.html Sun Jan 31 12:31:17
2016
@@ -100,7 +100,7 @@ mvn -Pstorm package
<p>Once the dependencies are installed, you can simply clone the repository
and install SAMOA.</p>
<div class="highlight"><pre><code class="language-bash" data-lang="bash">git
clone http://git.apache.org/incubator-samoa.git
<span class="nb">cd </span>incubator-samoa
-mvn -P<variant> package <span class="c"># where variant is
"storm" or "s4"</span>
+mvn -P<variant> package <span class="c"># where variant is "storm" or
"s4"</span>
mvn -Pstorm,s4 package <span class="c"># e.g., to get both versions</span>
</code></pre></div>
Modified: incubator/samoa/site/documentation/Bylaws.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Bylaws.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Bylaws.html (original)
+++ incubator/samoa/site/documentation/Bylaws.html Sun Jan 31 12:31:17 2016
@@ -85,17 +85,17 @@
<p>Apache projects define a set of roles with associated rights and
responsibilities. These roles govern which tasks an individual may perform
within the project. The roles are defined in the following sections.</p>
-<h3 id="users:">Users:</h3>
+<h3 id="users">Users:</h3>
<p>The most important participants in the project are people who use our
software. The majority of our developers start out as users and guide their
development efforts from the user's perspective.</p>
<p>Users contribute to Apache projects by providing feedback to developers in
the form of bug reports and feature suggestions. In addition, users participate
in the Apache community by helping other users on mailing lists and user
support forums.</p>
-<h3 id="contributors:">Contributors:</h3>
+<h3 id="contributors">Contributors:</h3>
<p>All of the volunteers who are contributing time, code, documentation, or
resources to the SAMOA project. A contributor that makes sustained, welcome
contributions to the project may be invited to become a Committer, though the
exact timing of such invitations depends on many factors.</p>
-<h3 id="committers:">Committers:</h3>
+<h3 id="committers">Committers:</h3>
<p>The project's Committers are responsible for the project's
technical management. Committers have access to all project source
repositories. Committers may cast binding votes on any technical discussion
regarding SAMOA.</p>
@@ -105,7 +105,7 @@
<p>A Committer who makes a sustained contribution to the project may be
invited to become a member of the PMC. The form of contribution is not limited
to code. It can also include other activities such as code review, helping out
users on the mailing lists, documentation, and testing.</p>
-<h3 id="project-management-committee-(pmc):">Project Management Committee
(PMC):</h3>
+<h3 id="project-management-committee-pmc">Project Management Committee
(PMC):</h3>
<p>The PMC is responsible to the board and the ASF for the management and
oversight of the Apache SAMOA codebase. The responsibilities of the PMC
include:</p>
Modified: incubator/samoa/site/documentation/Content-Event.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Content-Event.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Content-Event.html (original)
+++ incubator/samoa/site/documentation/Content-Event.html Sun Jan 31 12:31:17
2016
@@ -75,10 +75,10 @@
<article class="post-content">
<p>A message or an event is called Content Event in SAMOA. As the name
suggests, it is an event which contains content which needs to be processed by
the processors.</p>
-<h3 id="1.-implementation">1. Implementation</h3>
+<h3 id="1-implementation">1. Implementation</h3>
<p>ContentEvent has been implemented as an interface in SAMOA. Users need to
implement <code>ContentEvent</code> interface to create their custom message
classes. As it can be seen in the following code, key is the necessary part of
a message.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">package com.yahoo.labs.samoa.core;
+<div class="highlight"><pre><code class="language-" data-lang="">package
com.yahoo.labs.samoa.core;
public interface ContentEvent extends java.io.Serializable {
@@ -89,26 +89,26 @@ public interface ContentEvent extends ja
public boolean isLastEvent();
}
</code></pre></div>
-<h3 id="2.-methods">2. Methods</h3>
+<h3 id="2-methods">2. Methods</h3>
<p>Following is a brief description of methods.</p>
-<h5 id="2.1-string-getkey()">2.1 <code>String getKey()</code></h5>
+<h5 id="2-1-string-getkey">2.1 <code>String getKey()</code></h5>
<p>Each message is identified by a key in SAMOA. All user-defined message
classes should have a key state variable. Each instance of the custom message
should be assigned a key. This method should return the key of the respective
message.</p>
-<h5 id="2.2-void-setkey(string-str)">2.2 <code>void setKey(String
str)</code></h5>
+<h5 id="2-2-void-setkey-string-str">2.2 <code>void setKey(String
str)</code></h5>
<p>This method is used to assign a key to the message.</p>
-<h5 id="2.3-boolean-islastevent()">2.3 <code>boolean isLastEvent()</code></h5>
+<h5 id="2-3-boolean-islastevent">2.3 <code>boolean isLastEvent()</code></h5>
<p>This method lets SAMOA know that this message is the last message.</p>
-<h3 id="3.-example">3. Example</h3>
+<h3 id="3-example">3. Example</h3>
<p>Following is the example of a <code>Message</code> class which implements
<code>ContentEvent</code> interface. As <code>ContentEvent</code> is an
interface, it can not hold variables. A user-defined message class should have
its own data variables and its getter methods. In the following example,
<code>value</code> variable of type <code>Object</code> is added to the class.
Using a generic type <code>Object</code> is beneficial in the sense that any
object can be passed to it and later it can be casted back to the original
type. The following example also adds a <code>streamId</code> variable which
stores the <code>id</code> of the stream the message belongs to. This is not a
requirement but can be beneficial in certain applications.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">import com.yahoo.labs.samoa.core.ContentEvent;
+<div class="highlight"><pre><code class="language-" data-lang="">import
com.yahoo.labs.samoa.core.ContentEvent;
/**
* A general key-value message class which adds a stream id in the class
variables
@@ -188,6 +188,7 @@ public class Message implements ContentE
}
}
+
</code></pre></div>
</article>
Modified: incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html
(original)
+++ incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html Sun
Jan 31 12:31:17 2016
@@ -94,7 +94,7 @@
<p>The SAMOA runtime invokes the <code>nextEvent</code> method of
<code>EntranceProcessor</code> until its <code>hasNext</code> method returns
false. Each call to <code>nextEvent</code> should return the next
<code>ContentEvent</code> to be sent to the topology. In this tutorial,
<code>HelloWorldSourceProcessor</code> sends events of type
<code>HelloWorldContentEvent</code>.</p>
<p>Here is the relevant code in <code>HelloWorldSourceProcessor</code>:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">
private Random rnd;
+<div class="highlight"><pre><code class="language-" data-lang=""> private
Random rnd;
private final long maxInst;
private long count;
@@ -110,7 +110,7 @@
}
</code></pre></div>
<p>We also need to create a new type of <code>ContentEvent</code> to hold our
data. In this tutorial we call it <code>HelloWorldContentEvent</code> and its
content is simply an integer.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">public class HelloWorldContentEvent implements ContentEvent {
+<div class="highlight"><pre><code class="language-" data-lang="">public class
HelloWorldContentEvent implements ContentEvent {
private static final long serialVersionUID = -2406968925730298156L;
private final boolean isLastEvent;
@@ -128,7 +128,7 @@
@Override
public void setKey(String str) {
- // do nothing, it's key-less content event
+ // do nothing, it's key-less content event
}
@Override
@@ -142,21 +142,21 @@
@Override
public String toString() {
- return "HelloWorldContentEvent [helloWorldData=" +
helloWorldData + "]";
+ return "HelloWorldContentEvent [helloWorldData=" + helloWorldData +
"]";
}
}
</code></pre></div>
<h3 id="hello-world-destination-processor">Hello World Destination
Processor</h3>
<p>The destination processor for SAMOA is pretty straightforward and it will
print the data from the event.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">public class HelloWorldDestinationProcessor implements
Processor {
+<div class="highlight"><pre><code class="language-" data-lang="">public class
HelloWorldDestinationProcessor implements Processor {
private static final long serialVersionUID = -6042613438148776446L;
private int processorId;
@Override
public boolean process(ContentEvent event) {
- System.out.println(processorId + ": " + event);
+ System.out.println(processorId + ": " + event);
return true;
}
@@ -174,12 +174,12 @@
<h3 id="putting-it-all-together">Putting It All Together</h3>
<p>To put all the components together, we need to go back to class
<code>HelloWorldTask</code>. First, we need to implement the code for setting
up the <code>TopologyBuilder</code>. This code is necessary to be able to run
on multiple platforms.</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">
@Override
+<div class="highlight"><pre><code class="language-" data-lang=""> @Override
public void setFactory(ComponentFactory factory) {
builder = new TopologyBuilder(factory);
- logger.debug("Sucessfully instantiating TopologyBuilder");
+ logger.debug("Sucessfully instantiating TopologyBuilder");
builder.initTopology(evaluationNameOption.getValue());
- logger.debug("Sucessfully initializing SAMOA topology with name
{}", evaluationNameOption.getValue());
+ logger.debug("Sucessfully initializing SAMOA topology with name {}",
evaluationNameOption.getValue());
}
</code></pre></div>
<p>After this method is called we have a functioning builder to get components
for our topology. Next, the <code>init</code> method is called by SAMOA to
start the task.
@@ -187,7 +187,7 @@ First we instantiate the source <code>En
After adding the entrance processor to the topology, we create a stream
originating from it. We use the create stream method of
<code>TopologyBuilder</code>.
Next we create the destination processor and connect it to the stream by using
shuffle grouping.
Once we have created all the components, we use the builder to build the
topology.</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">
@Override
+<div class="highlight"><pre><code class="language-" data-lang=""> @Override
public void init() {
// create source EntranceProcesor
sourceProcessor = new
HelloWorldSourceProcessor(instanceLimitOption.getValue());
@@ -203,16 +203,16 @@ Once we have created all the components,
// build the topology
helloWorldTopology = builder.build();
- logger.debug("Successfully built the topology");
+ logger.debug("Successfully built the topology");
}
</code></pre></div>
<h3 id="running-it">Running It</h3>
<p>To run the example in local mode:</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">bin/samoa local target/SAMOA-Local-0.0.1-SNAPSHOT.jar
"com.yahoo.labs.samoa.examples.HelloWorldTask -p 4 -i 100"
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
local target/SAMOA-Local-0.0.1-SNAPSHOT.jar
"com.yahoo.labs.samoa.examples.HelloWorldTask -p 4 -i 100"
</code></pre></div>
<p>To run the example in Storm local mode:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">java
-cp
$STORM_HOME/lib/*:$STORM_HOME/storm-0.8.2.jar:target/SAMOA-Storm-0.0.1-SNAPSHOT.jar
com.yahoo.labs.samoa.LocalStormDoTask
"com.yahoo.labs.samoa.examples.HelloWorldTask -p 4 -i 1000"
+<div class="highlight"><pre><code class="language-" data-lang="">java -cp
$STORM_HOME/lib/*:$STORM_HOME/storm-0.8.2.jar:target/SAMOA-Storm-0.0.1-SNAPSHOT.jar
com.yahoo.labs.samoa.LocalStormDoTask
"com.yahoo.labs.samoa.examples.HelloWorldTask -p 4 -i 1000"
</code></pre></div>
<p>All the code for the HelloWorldTask and its components can be found <a
href="https://github.com/yahoo/samoa/tree/master/samoa-api/src/main/java/com/yahoo/labs/samoa/examples">here</a>.</p>
Modified: incubator/samoa/site/documentation/Distributed-Stream-Clustering.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Distributed-Stream-Clustering.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Distributed-Stream-Clustering.html
(original)
+++ incubator/samoa/site/documentation/Distributed-Stream-Clustering.html Sun
Jan 31 12:31:17 2016
@@ -76,7 +76,7 @@
<h2 id="apache-samoa-clustering-algorithm">Apache SAMOA Clustering
Algorithm</h2>
<p>The SAMOA Clustering Algorithm is invoked by using the
<code>ClusteringEvaluation</code> task. The clustering task can be executed
with default values just by running:</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar
"ClusteringEvaluation"
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "ClusteringEvaluation"
</code></pre></div>
<p>Parameters:</p>
Modified:
incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
---
incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html
(original)
+++
incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html
Sun Jan 31 12:31:17 2016
@@ -73,11 +73,11 @@
</header>
<article class="post-content">
- <h2 id="1.-introduction">1. Introduction</h2>
+ <h2 id="1-introduction">1. Introduction</h2>
<p>SAMOA takes a micro-batching approach to frequent itemset mining (FIM). It
uses <a href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a> as a base
algorithm for distributed sample-based frequent itemset mining. PARMA provides
the guaranty that all the frequent itemsets would be present in the result that
it returns.It also returns some false positives. The problem with FIM in
streams is that the stream has an evolving nature. The itemsets that were
frequent last year may not be frequent this year. To handle this, SAMOA
implements <a href="https://dl.acm.org/citation.cfm?id=1164180">Time Biased
Sampling</a> approach. This sampling method depends on a parameter
<em>lambda</em> which determines the size of the reservoir sample. This also
tells us how much biased the sample would be towards newer itemsets. As PARMA
has its own way of determining sample sizes, SAMOA does not allow users to
choose <em>lambda</em> and determines its value using the sample size
determined by PARMA
using the approximation <code>lambda = 1/sampleSize</code>. </p>
-<h2 id="2.-concepts">2. Concepts</h2>
+<h2 id="2-concepts">2. Concepts</h2>
<p>SAMOA implements FIM for streams in three processors i.e.
StreamSourceProcessor, SamplerProcessor and AggregatorProcessor. The tasks of
each of these are explained below.</p>
@@ -91,10 +91,10 @@
<p><img src="images/SAMOA%20FIM.jpg" alt="SAMOA FIM"></p>
-<h2 id="3.-how-to-run">3. How to run</h2>
+<h2 id="3-how-to-run">3. How to run</h2>
<p>Following is an example of the command used to run the SAMOA FIM task.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar
"FpmTask -t Myfpmtopology -r
(com.yahoo.labs.samoa.fpm.processors.FileReaderProcessor -i
/datasets/freqDataCombined.txt) -m
(com.yahoo.labs.samoa.fpm.processors.ParmaStreamFpmMiner -e .1 -d .1 -f 10 -t
20 -n 23 -p 0.08 -b 100000 -s
com.yahoo.labs.samoa.samplers.reservoir.TimeBiasedReservoirSampler) -w
(com.yahoo.labs.samoa.fpm.processors.FileWriterProcessor -o /output/outPARMA)
"
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "FpmTask -t Myfpmtopology -r
(com.yahoo.labs.samoa.fpm.processors.FileReaderProcessor -i
/datasets/freqDataCombined.txt) -m
(com.yahoo.labs.samoa.fpm.processors.ParmaStreamFpmMiner -e .1 -d .1 -f 10 -t
20 -n 23 -p 0.08 -b 100000 -s
com.yahoo.labs.samoa.samplers.reservoir.TimeBiasedReservoirSampler) -w
(com.yahoo.labs.samoa.fpm.processors.FileWriterProcessor -o /output/outPARMA) "
</code></pre></div>
<p>Parameters:
To run an FIM task, four parameters are required</p>
Modified: incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html
(original)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html Sun
Jan 31 12:31:17 2016
@@ -134,22 +134,22 @@ mvn -Ps4 package
<p>This section will go through the <code>bin/samoa-s4.properties</code> file
and how to configure it.
In order for SAMOA to run correctly in a distributed environment there are
some variables that need to be defined. Since Apache S4 uses <a
href="https://zookeeper.apache.org/">ZooKeeper</a> for cluster management we
need to define where it is running.</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">#
Zookeeper Server
+<div class="highlight"><pre><code class="language-" data-lang=""># Zookeeper
Server
zookeeper.server=localhost
zookeeper.port=2181
</code></pre></div>
<p>Apache S4 also distributes the application via HTTP, therefore the server
and port which contains the S4 application must be provided.</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">#
Simple HTTP Server providing the packaged S4 jar
+<div class="highlight"><pre><code class="language-" data-lang=""># Simple HTTP
Server providing the packaged S4 jar
http.server.ip=localhost
http.server.port=8000
</code></pre></div>
<p>Apache S4 uses the concept of logical clusters to define a group of
machines, which are identified by an ID and start serving on a specific
port.</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">#
Name of the S4 cluster
+<div class="highlight"><pre><code class="language-" data-lang=""># Name of the
S4 cluster
cluster.name=cluster
cluster.port=12000
</code></pre></div>
<p>SAMOA can be deployed on a single machine using only one resource or in a
cluster environments. The following property can be defined to deploy as a
<code>local</code> application or on a <code>cluster</code>.</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">#
Deployment strategy
+<div class="highlight"><pre><code class="language-" data-lang=""># Deployment
strategy
samoa.deploy.mode=local
</code></pre></div>
<hr>
@@ -163,7 +163,7 @@ The execution syntax is as follows:
<code>bin/samoa <platform> <jar-location> <task &
options></code></p>
<p>Example:</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">bin/samoa S4 target/SAMOA-S4-0.0.1-SNAPSHOT.jar
"ClusteringEvaluation"
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa S4
target/SAMOA-S4-0.0.1-SNAPSHOT.jar "ClusteringEvaluation"
</code></pre></div>
<p>The <platform> can be s4 or storm.</p>
Modified:
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html
(original)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html
Sun Jan 31 12:31:17 2016
@@ -104,17 +104,17 @@ The steps included in this tutorial are:
<li><p>Download the binary release from the <a
href="http://zookeeper.apache.org/releases.html">release page</a>.</p></li>
<li><p>Untar the archive</p></li>
</ol>
-<div class="highlight"><pre><code class="language-text" data-lang="text">tar
-xf $DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/
+<div class="highlight"><pre><code class="language-" data-lang="">tar -xf
$DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/
</code></pre></div>
<ol>
<li>Copy the default configuration file</li>
</ol>
-<div class="highlight"><pre><code class="language-text" data-lang="text">cp
zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg
+<div class="highlight"><pre><code class="language-" data-lang="">cp
zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg
</code></pre></div>
<ol>
<li>Start the single-node cluster</li>
</ol>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">~/zookeeper-3.4.6/bin/zkServer.sh start
+<div class="highlight"><pre><code class="language-"
data-lang="">~/zookeeper-3.4.6/bin/zkServer.sh start
</code></pre></div>
<h3 id="kafka">Kafka</h3>
@@ -124,17 +124,17 @@ The steps included in this tutorial are:
<li><p>Download a binary release of Kafka <a
href="http://kafka.apache.org/downloads.html">here</a>. As mentioned in the
page, the Scala version does not matter. However, 2.10 is recommended as Samza
has recently been moved to Scala 2.10.</p></li>
<li><p>Untar the archive </p></li>
</ol>
-<div class="highlight"><pre><code class="language-text" data-lang="text">tar
-xzf $DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/
+<div class="highlight"><pre><code class="language-" data-lang="">tar -xzf
$DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/
</code></pre></div>
<p>If you are running in local mode or a single-node cluster, you can now
start Kafka with the command:</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">~/kafka_2.10-0.8.1/bin/kafka-server-start.sh
kafka_2.10-0.8.1/config/server.properties
+<div class="highlight"><pre><code class="language-"
data-lang="">~/kafka_2.10-0.8.1/bin/kafka-server-start.sh
kafka_2.10-0.8.1/config/server.properties
</code></pre></div>
<p>In multi-node cluster, it is typical and convenient to have a Kafka broker
on each node (although you can totally have a smaller Kafka cluster, or even a
single-node Kafka cluster). The number of brokers in Kafka cluster will affect
disk bandwidth and space (the more brokers we have, the higher value we will
get for the two). In each node, you need to set the following properties in
<code>~/kafka_2.10-0.8.1/config/server.properties</code> before starting Kafka
service.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">broker.id=a-unique-number-for-each-node
+<div class="highlight"><pre><code class="language-"
data-lang="">broker.id=a-unique-number-for-each-node
zookeeper.connect=zookeeper-host0-url:2181[,zookeeper-host1-url:2181,...]
</code></pre></div>
<p>You might want to change the retention hours or retention bytes of the logs
to avoid the logs size from growing too big.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">log.retention.hours=number-of-hours-to-keep-the-logs
+<div class="highlight"><pre><code class="language-"
data-lang="">log.retention.hours=number-of-hours-to-keep-the-logs
log.retention.bytes=number-of-bytes-to-keep-in-the-logs
</code></pre></div>
<h3 id="hadoop-yarn-and-hdfs">Hadoop YARN and HDFS</h3>
@@ -149,7 +149,7 @@ log.retention.bytes=number-of-bytes-to-k
<p><strong>HDFS</strong></p>
<p>Set the following properties in
<code>~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml</code> in all nodes.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text"><configuration>
+<div class="highlight"><pre><code class="language-"
data-lang=""><configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/username/hadoop-2.2.0/hdfs/datanode</value>
@@ -164,7 +164,7 @@ log.retention.bytes=number-of-bytes-to-k
</configuration>
</code></pre></div>
<p>Add this property in <code>~/hadoop-2.2.0/etc/hadoop/core-site.xml</code>
in all nodes.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text"><configuration>
+<div class="highlight"><pre><code class="language-"
data-lang=""><configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000/</value>
@@ -180,18 +180,18 @@ log.retention.bytes=number-of-bytes-to-k
<p>For a multi-node cluster, change the hostname ("localhost") to
the correct host name of your namenode server.</p>
<p>Format HDFS directory (only perform this if you are running it for the very
first time)</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">~/hadoop-2.2.0/bin/hdfs namenode -format
+<div class="highlight"><pre><code class="language-"
data-lang="">~/hadoop-2.2.0/bin/hdfs namenode -format
</code></pre></div>
<p>Start namenode daemon on one of the node</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode
+<div class="highlight"><pre><code class="language-"
data-lang="">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode
</code></pre></div>
<p>Start datanode daemon on all nodes</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode
+<div class="highlight"><pre><code class="language-"
data-lang="">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode
</code></pre></div>
<p><strong>YARN</strong></p>
<p>If you are running in multi-node cluster, set the resource manager hostname
in <code>~/hadoop-2.2.0/etc/hadoop/yarn-site.xml</code> in all nodes as
follow:</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text"><configuration>
+<div class="highlight"><pre><code class="language-"
data-lang=""><configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>resourcemanager-url</value>
@@ -201,30 +201,30 @@ log.retention.bytes=number-of-bytes-to-k
</code></pre></div>
<p><strong>Other configurations</strong>
Now we need to tell Samza where to find the configuration of YARN cluster. To
do this, first create a new directory in all nodes:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">mkdir
~/.samza
+<div class="highlight"><pre><code class="language-" data-lang="">mkdir ~/.samza
mkdir ~/.samza/conf
</code></pre></div>
<p>Copy (or soft link) <code>core-site.xml</code>, <code>hdfs-site.xml</code>,
<code>yarn-site.xml</code> in <code>~/hadoop-2.2.0/etc/hadoop</code> to the new
directory </p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">ln -s
~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml
+<div class="highlight"><pre><code class="language-" data-lang="">ln -s
~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml
ln -s ~/.samza/conf/hdfs-site.xml ~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
ln -s ~/.samza/conf/yarn-site.xml ~/hadoop-2.2.0/etc/hadoop/yarn-site.xml
</code></pre></div>
<p>Export the enviroment variable YARN_HOME (in ~/.bashrc) so Samza knows
where to find these YARN configuration files.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">export YARN_HOME=$HOME/.samza
+<div class="highlight"><pre><code class="language-" data-lang="">export
YARN_HOME=$HOME/.samza
</code></pre></div>
<p><strong>Start the YARN cluster</strong>
Start resource manager on master node</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager
+<div class="highlight"><pre><code class="language-"
data-lang="">~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager
</code></pre></div>
<p>Start node manager on all worker nodes</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager
+<div class="highlight"><pre><code class="language-"
data-lang="">~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager
</code></pre></div>
<h2 id="build-samoa">Build SAMOA</h2>
<p>Perform the following step on one of the node in the cluster. Here we
assume git and maven are installed on this node.</p>
<p>Since Samza is not yet released on Maven, we will have to clone Samza
project, build and publish to Maven local repository:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">git
clone -b 0.7.0 https://github.com/apache/incubator-samza.git
+<div class="highlight"><pre><code class="language-" data-lang="">git clone -b
0.7.0 https://github.com/apache/incubator-samza.git
cd incubator-samza
./gradlew clean build
./gradlew publishToMavenLocal
@@ -232,7 +232,7 @@ cd incubator-samza
<p>Here we cloned and installed Samza version 0.7.0, the current released
version (July 2014). </p>
<p>Now we can clone the repository and install SAMOA.</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">git
clone http://git.apache.org/incubator-samoa.git
+<div class="highlight"><pre><code class="language-" data-lang="">git clone
http://git.apache.org/incubator-samoa.git
cd incubator-samoa
mvn -Psamza package
</code></pre></div>
@@ -243,21 +243,21 @@ mvn -Psamza package
<p>This section explains the configuration parameters in
<code>bin/samoa-samza.properties</code> that are required to run SAMOA on top
of Samza.</p>
<p><strong>Samza execution mode</strong></p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">samoa.samza.mode=[yarn|local]
+<div class="highlight"><pre><code class="language-"
data-lang="">samoa.samza.mode=[yarn|local]
</code></pre></div>
<p>This parameter specify which mode to execute the task: <code>local</code>
for local execution and <code>yarn</code> for cluster execution.</p>
<p><strong>Zookeeper</strong></p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">zookeeper.connect=localhost
+<div class="highlight"><pre><code class="language-"
data-lang="">zookeeper.connect=localhost
zookeeper.port=2181
</code></pre></div>
<p>The default setting above applies for local mode execution. For cluster
mode, change <code>zookeeper.host</code> to the correct URL of your zookeeper
host.</p>
<p><strong>Kafka</strong></p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">kafka.broker.list=localhost:9092
+<div class="highlight"><pre><code class="language-"
data-lang="">kafka.broker.list=localhost:9092
</code></pre></div>
<p><code>kafka.broker.list</code> is a comma separated list of host:port of
all the brokers in Kafka cluster.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">kafka.replication.factor=1
+<div class="highlight"><pre><code class="language-"
data-lang="">kafka.replication.factor=1
</code></pre></div>
<p><code>kafka.replication.factor</code> specifies the number of replicas for
each stream in Kafka. This number must be less than or equal to the number of
brokers in Kafka cluster.</p>
@@ -268,26 +268,26 @@ zookeeper.port=2181
</blockquote>
<p><code>yarn.am.memory</code> and <code>yarn.container.memory</code> specify
the memory requirement for the Application Master container and the worker
containers, respectively. </p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">yarn.am.memory=1024
+<div class="highlight"><pre><code class="language-"
data-lang="">yarn.am.memory=1024
yarn.container.memory=1024
</code></pre></div>
<p><code>yarn.package.path</code> specifies the path (typically a HDFS path)
of the package to be distributed to all YARN containers to execute the task.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar
+<div class="highlight"><pre><code class="language-"
data-lang="">yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar
</code></pre></div>
<p><strong>Samza</strong>
<code>max.pi.per.container</code> specifies the number of PI instances allowed
in one YARN container. </p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">max.pi.per.container=1
+<div class="highlight"><pre><code class="language-"
data-lang="">max.pi.per.container=1
</code></pre></div>
<p><code>kryo.register.file</code> specifies the registration file for Kryo
serializer.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">kryo.register.file=samza-kryo
+<div class="highlight"><pre><code class="language-"
data-lang="">kryo.register.file=samza-kryo
</code></pre></div>
<p><code>checkpoint.commit.ms</code> specifies the frequency for PIs to commit
their checkpoints (in ms). The default value is 1 minute.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">checkpoint.commit.ms=60000
+<div class="highlight"><pre><code class="language-"
data-lang="">checkpoint.commit.ms=60000
</code></pre></div>
<h2 id="deploy-samoa-samza-task">Deploy SAMOA-Samza task</h2>
<p>Execute SAMOA task with the following command:</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">bin/samoa samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar
"<task> & <options>"
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar "<task> &
<options>"
</code></pre></div>
<h2 id="observe-execution-and-result">Observe execution and result</h2>
Modified:
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html
(original)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html
Sun Jan 31 12:31:17 2016
@@ -88,8 +88,8 @@
<p>Before we start the tutorial, please ensure that you already have Storm
cluster (preferably Storm 0.8.2) running. You can follow this <a
href="http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/">tutorial</a>
to set up a Storm cluster.</p>
<p>You also need to install Storm at the machine where you initiate the
deployment, and configure Storm (at least) with this configuration in
<code>~/.storm/storm.yaml</code>:</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">########### These MUST be filled in for a storm configuration
-nimbus.host: "<enter your nimbus host name here>"
+<div class="highlight"><pre><code class="language-" data-lang="">###########
These MUST be filled in for a storm configuration
+nimbus.host: "<enter your nimbus host name here>"
## List of custom serializations
kryo.register:
@@ -121,7 +121,7 @@ kryo.register:
<li><code>bin/samoa-storm.properties</code> contains deployment
configurations. You need to set the parameters in this properties file
correctly. </li>
</ol>
-<h3 id="-samoa-storm.properties-configuration"><a
name="samoa-storm-properties"> samoa-storm.properties Configuration</a></h3>
+<h3 id="samoa-storm-properties-configuration"><a
name="samoa-storm-properties"> samoa-storm.properties Configuration</a></h3>
<p>Currently, the properties file contains two configurations:</p>
@@ -131,7 +131,7 @@ kryo.register:
</ol>
<p>Here is the example of a complete properties file:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">#
SAMOA Storm properties file
+<div class="highlight"><pre><code class="language-" data-lang=""># SAMOA Storm
properties file
# This file contains specific configurations for SAMOA deployment in the Storm
platform
# Note that you still need to configure Storm client in your machine,
# including setting up Storm configuration file (~/.storm/storm.yaml) with
correct settings
@@ -158,7 +158,7 @@ samoa.storm.numworker=7
<p><code>"<task>"</code> is the SAMOA task command line such
as <code>PrequentialEvaluation</code> or <code>ClusteringTask</code>. This
command line for SAMOA task follows the format of <a
href="http://moa.cms.waikato.ac.nz/details/classification/command-line/">Massive
Online Analysis (MOA)</a>.</p>
<p>The complete command to execute SAMOA is:</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar
"PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l
(com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s
(com.yahoo.labs.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u
10)"
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d
/tmp/dump.csv -i 1000000 -f 100000 -l
(com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s
(com.yahoo.labs.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u
10)"
</code></pre></div>
<p>The example above uses <a href="Prequential-Evaluation-Task">Prequential
Evaluation task</a> and <a href="Vertical-Hoeffding-Tree-Classifier">Vertical
Hoeffding Tree</a> classifier. </p>
Modified: incubator/samoa/site/documentation/Getting-Started.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Getting-Started.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Getting-Started.html (original)
+++ incubator/samoa/site/documentation/Getting-Started.html Sun Jan 31 12:31:17
2016
@@ -85,7 +85,7 @@ mvn package <span class="c">#Local
<ul>
<li>2. Download the Forest CoverType dataset </li>
</ul>
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">wget
<span
class="s2">"http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip"</span>
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">wget
<span
class="s2">"http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip"</span>
unzip covtypeNorm.arff.zip
</code></pre></div>
<p><em>Forest Covertype</em> contains the forest cover type for 30 x 30 meter
cells obtained from the US Forest Service (USFS) Region 2 Resource Information
System (RIS) data. It contains 581,012 instances and 54 attributes, and it has
been used in several articles on data stream classification.</p>
@@ -93,8 +93,8 @@ unzip covtypeNorm.arff.zip
<ul>
<li>3. Run an example: classifying the CoverType dataset with the bagging
algorithm</li>
</ul>
-<div class="highlight"><pre><code class="language-bash"
data-lang="bash">bin/samoa <span class="nb">local
</span>target/SAMOA-Local-0.3.0-SNAPSHOT.jar <span
class="s2">"PrequentialEvaluation -l classifiers.ensemble.Bagging </span>
-<span class="s2"> -s (ArffFileStream -f covtypeNorm.arff) -f
100000"</span>
+<div class="highlight"><pre><code class="language-bash"
data-lang="bash">bin/samoa <span class="nb">local
</span>target/SAMOA-Local-0.3.0-SNAPSHOT.jar <span
class="s2">"PrequentialEvaluation -l classifiers.ensemble.Bagging
+ -s (ArffFileStream -f covtypeNorm.arff) -f 100000"</span>
</code></pre></div>
<p>The output will be a list of the evaluation results, plotted each 100,000
instances.</p>
Modified: incubator/samoa/site/documentation/Home.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Home.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Home.html (original)
+++ incubator/samoa/site/documentation/Home.html Sun Jan 31 12:31:17 2016
@@ -96,6 +96,7 @@ SAMOA is similar to Mahout in spirit, bu
<li><a href="Executing-SAMOA-with-Apache-Storm.html">1.1 Executing SAMOA with
Apache Storm</a></li>
<li><a href="Executing-SAMOA-with-Apache-S4.html">1.2 Executing SAMOA with
Apache S4</a></li>
<li><a href="Executing-SAMOA-with-Apache-Samza.html">1.3 Executing SAMOA with
Apache Samza</a></li>
+<li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">1.4 Executing SAMOA
with Apache Avro Files</a></li>
</ul></li>
<li><a href="SAMOA-and-Machine-Learning.html">2 Machine Learning Methods in
SAMOA</a>
Modified: incubator/samoa/site/documentation/Learner.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Learner.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Learner.html (original)
+++ incubator/samoa/site/documentation/Learner.html Sun Jan 31 12:31:17 2016
@@ -74,7 +74,7 @@
<article class="post-content">
<p>Learners are implemented in SAMOA as sub-topologies.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">public interface Learner extends Serializable{
+<div class="highlight"><pre><code class="language-" data-lang="">public
interface Learner extends Serializable{
public void init(TopologyBuilder topologyBuilder, Instances dataset);
Modified: incubator/samoa/site/documentation/Prequential-Evaluation-Task.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Prequential-Evaluation-Task.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Prequential-Evaluation-Task.html
(original)
+++ incubator/samoa/site/documentation/Prequential-Evaluation-Task.html Sun Jan
31 12:31:17 2016
@@ -76,7 +76,7 @@
<p>In data stream mining, the most used evaluation scheme is the
prequential or interleaved-test-then-train evolution. The idea is very simple:
we use each instance first to test the model, and then to train the model. The
Prequential Evaluation task evaluates the performance of online classifiers
doing this. It supports two classification performance evaluators: the basic
one which measures the accuracy of the classifier model since the start of the
evaluation, and a window based one which measures the accuracy on the current
sliding window of recent instances. </p>
<p>Examples of Prequential Evaluation task in SAMOA command line when
deploying into Storm</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar
"PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l
(classifiers.trees.VerticalHoeffdingTree -p 4) -s
(generators.RandomTreeGenerator -c 2 -o 10 -u 10)"
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d
/tmp/dump.csv -i 1000000 -f 100000 -l (classifiers.trees.VerticalHoeffdingTree
-p 4) -s (generators.RandomTreeGenerator -c 2 -o 10 -u 10)"
</code></pre></div>
<p>Parameters:</p>
Modified: incubator/samoa/site/documentation/Processing-Item.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Processing-Item.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Processing-Item.html (original)
+++ incubator/samoa/site/documentation/Processing-Item.html Sun Jan 31 12:31:17
2016
@@ -86,23 +86,23 @@ There are two types of Processing Items.
<li>Entrance Processing Item (EntrancePI)</li>
</ol>
-<h4 id="1.-simple-processing-item-(pi)">1. Simple Processing Item (PI)</h4>
+<h4 id="1-simple-processing-item-pi">1. Simple Processing Item (PI)</h4>
<p>Once a Processor is wrapped in a PI, it becomes an executable component of
the topology. All physical topology units are created with the help of a
<code>TopologyBuilder</code>. Following code snippet shows the creation of a
Processing Item.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">builder.initTopology("MyTopology");
+<div class="highlight"><pre><code class="language-"
data-lang="">builder.initTopology("MyTopology");
Processor samplerProcessor = new Sampler();
ProcessingItem samplerPI = builder.createPI(samplerProcessor,3);
</code></pre></div>
<p>The <code>createPI()</code> method of <code>TopologyBuilder</code> is used
to create a PI. Its first argument is the instance of a Processor which needs
to be wrapped-in. Its second argument is the parallelism hint. It tells the
underlying platforms how many parallel instances of this PI should be created
on different nodes.</p>
-<h4 id="2.-entrance-processing-item-(entrancepi)">2. Entrance Processing Item
(EntrancePI)</h4>
+<h4 id="2-entrance-processing-item-entrancepi">2. Entrance Processing Item
(EntrancePI)</h4>
<p>Entrance Processing Item is different from a PI in only one way: it accepts
an Entrance Processor which can generate its own stream.
It is mostly used as the source of a topology.
It connects to external sources, pulls data and provides it to the topology in
the form of streams.
All physical topology units are created with the help of a
<code>TopologyBuilder</code>.
The following code snippet shows the creation of an Entrance Processing
Item.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">builder.initTopology("MyTopology");
+<div class="highlight"><pre><code class="language-"
data-lang="">builder.initTopology("MyTopology");
EntranceProcessor sourceProcessor = new Source();
EntranceProcessingItem sourcePi = builder.createEntrancePi(sourceProcessor);
</code></pre></div>
Modified: incubator/samoa/site/documentation/Processor.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Processor.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Processor.html (original)
+++ incubator/samoa/site/documentation/Processor.html Sun Jan 31 12:31:17 2016
@@ -80,7 +80,7 @@
<p>There are two ways to add a processor to the topology.</p>
-<h4 id="1.-processor">1. Processor</h4>
+<h4 id="1-processor">1. Processor</h4>
<p>All physical topology units are created with the help of a
<code>TopologyBuilder</code>. Following code snippet shows how to add a
Processor to the topology.
<code>
@@ -89,7 +89,7 @@ builder.addProcessor(processor, paralell
</code>
<code>addProcessor()</code> method of <code>TopologyBuilder</code> is used to
add the processor. Its first argument is the instance of a Processor which
needs to be added. Its second argument is the parallelism hint. It tells the
underlying platforms how many parallel instances of this processor should be
created on different nodes.</p>
-<h4 id="2.-entrance-processor">2. Entrance Processor</h4>
+<h4 id="2-entrance-processor">2. Entrance Processor</h4>
<p>Some processors generates their own streams, and they are used as the
source of a topology. They connect to external sources, pull data and provide
it to the topology in the form of streams.
All physical topology units are created with the help of a
<code>TopologyBuilder</code>. The following code snippet shows how to add an
entrance processor to the topology and create a stream from it.
@@ -100,7 +100,7 @@ Stream source = builder.createStream(ent
</code></p>
<h3 id="preview-of-processor">Preview of Processor</h3>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">package samoa.core;
+<div class="highlight"><pre><code class="language-" data-lang="">package
samoa.core;
public interface Processor extends java.io.Serializable{
boolean process(ContentEvent event);
void onCreate(int id);
@@ -109,20 +109,20 @@ public interface Processor extends java.
</code></pre></div>
<h3 id="methods">Methods</h3>
-<h4 id="1.-boolean-process(contentevent-event)">1. <code>boolean
process(ContentEvent event)</code></h4>
+<h4 id="1-boolean-process-contentevent-event">1. <code>boolean
process(ContentEvent event)</code></h4>
<p>Users should implement the three methods shown above.
<code>process(ContentEvent event)</code> is the method in which all processing
logic should be implemented. <code>ContentEvent</code> is a type (interface)
which contains the event. This method will be called each time a new event is
received. It should return <code>true</code> if the event has been correctly
processed, <code>false</code> otherwise.</p>
-<h4 id="2.-void-oncreate(int-id)">2. <code>void onCreate(int id)</code></h4>
+<h4 id="2-void-oncreate-int-id">2. <code>void onCreate(int id)</code></h4>
<p>is the method in which all initialization code should be written. Multiple
copies/instances of the Processor are created based on the parallelism hint
specified by the user. SAMOA assigns each instance a unique id which is passed
as a parameter <code>id</code> to <code>onCreate(int it)</code> method of each
instance.</p>
-<h4 id="3.-processor-newprocessor(processor-p)">3. <code>Processor
newProcessor(Processor p)</code></h4>
+<h4 id="3-processor-newprocessor-processor-p">3. <code>Processor
newProcessor(Processor p)</code></h4>
<p>is very simple to implement. This method is just a technical overhead that
has no logical use except that it helps SAMOA in some of its internals. Users
should just return a new copy of the instance of this class which implements
this Processor interface. </p>
<h3 id="preview-of-entranceprocessor">Preview of EntranceProcessor</h3>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">package com.yahoo.labs.samoa.core;
+<div class="highlight"><pre><code class="language-" data-lang="">package
com.yahoo.labs.samoa.core;
public interface EntranceProcessor extends Processor {
public boolean isFinished();
@@ -132,15 +132,15 @@ public interface EntranceProcessor exten
</code></pre></div>
<h3 id="methods">Methods</h3>
-<h4 id="1.-boolean-isfinished()">1. <code>boolean isFinished()</code></h4>
+<h4 id="1-boolean-isfinished">1. <code>boolean isFinished()</code></h4>
<p>returns whether to expect more events coming from the entrance processor.
If the source is a live stream this method should return always
<code>false</code>. If the source is a file, the method should return
<code>true</code> once the file has been fully processed.</p>
-<h4 id="2.-boolean-hasnext()">2. <code>boolean hasNext()</code></h4>
+<h4 id="2-boolean-hasnext">2. <code>boolean hasNext()</code></h4>
<p>returns whether the next event is ready for consumption. If the method
returns <code>true</code> a subsequent call to <code>nextEvent</code> should
yield the next event to be processed. If the method returns <code>false</code>
the engine can use this information to avoid continuously polling the entrance
processor.</p>
-<h4 id="3.-contentevent-nextevent()">3. <code>ContentEvent
nextEvent()</code></h4>
+<h4 id="3-contentevent-nextevent">3. <code>ContentEvent nextEvent()</code></h4>
<p>is the main method for the entrance processor as it returns the next event
to be processed by the topology. It should be called only if
<code>isFinished()</code> returned <code>false</code> and
<code>hasNext()</code> returned <code>true</code>.</p>
Modified:
incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
---
incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html
(original)
+++
incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html
Sun Jan 31 12:31:17 2016
@@ -82,6 +82,7 @@
<li><a href="Executing-SAMOA-with-Apache-Storm.html">Executing SAMOA with
Apache Storm</a></li>
<li><a href="Executing-SAMOA-with-Apache-S4.html">Executing SAMOA with Apache
S4</a></li>
<li><a href="Executing-SAMOA-with-Apache-Samza.html">Executing SAMOA with
Apache Samza</a></li>
+<li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">Executing SAMOA with
Apache Avro Files</a></li>
</ul>
</article>
Modified: incubator/samoa/site/documentation/Stream.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Stream.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Stream.html (original)
+++ incubator/samoa/site/documentation/Stream.html Sun Jan 31 12:31:17 2016
@@ -75,15 +75,15 @@
<article class="post-content">
<p>A stream is a physical unit of SAMOA topology which connects different
Processors with each other. Stream is also created by a
<code>TopologyBuilder</code> just like a Processor. A stream can have a single
source but many destinations. A Processor which is the source of a stream, owns
the stream.</p>
-<h3 id="1.-creating-a-stream">1. Creating a Stream</h3>
+<h3 id="1-creating-a-stream">1. Creating a Stream</h3>
<p>The following code snippet shows how a Stream is created:</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">builder.initTopology("MyTopology");
+<div class="highlight"><pre><code class="language-"
data-lang="">builder.initTopology("MyTopology");
Processor sourceProcessor = new Sampler();
builder.addProcessor(samplerProcessor, 3);
Stream sourceDataStream = builder.createStream(sourceProcessor);
</code></pre></div>
-<h3 id="2.-connecting-a-stream">2. Connecting a Stream</h3>
+<h3 id="2-connecting-a-stream">2. Connecting a Stream</h3>
<p>As described above, a Stream can have many destinations. In the following
figure, a single stream from sourceProcessor is connected to three different
destination Processors each having three instances.</p>
@@ -91,28 +91,28 @@ Stream sourceDataStream = builder.create
<p>SAMOA supports three different ways of distribution of messages to multiple
instances of a Processor.</p>
-<h4 id="2.1-shuffle">2.1 Shuffle</h4>
+<h4 id="2-1-shuffle">2.1 Shuffle</h4>
<p>In this way of message distribution, messages/events are distributed
randomly among various instances of a Processor.
Following figure shows how the messages are distributed.
<img src="images/SAMOA%20Explain%20Shuffling.png" alt="SAMOA Explain
Shuffling">
Following code snipped shows how to connect a stream to a destination using
random shuffling.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">builder.connectInputShuffleStream(sourceDataStream,
destinationProcessor);
+<div class="highlight"><pre><code class="language-"
data-lang="">builder.connectInputShuffleStream(sourceDataStream,
destinationProcessor);
</code></pre></div>
-<h4 id="2.2-key">2.2 Key</h4>
+<h4 id="2-2-key">2.2 Key</h4>
<p>In this way of message distribution, messages with same key are sent to
same instance of a Processor.
Following figure illustrates key-based distribution.
<img src="images/SAMOA%20Explain%20Key%20Shuffling.png" alt="SAMOA Explain Key
Shuffling">
Following code snippet shows how to connect a stream to a destination using
key-based distribution.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">builder.connectInputKeyStream(sourceDataStream,
destinationProcessor);
+<div class="highlight"><pre><code class="language-"
data-lang="">builder.connectInputKeyStream(sourceDataStream,
destinationProcessor);
</code></pre></div>
-<h4 id="2.3-all">2.3 All</h4>
+<h4 id="2-3-all">2.3 All</h4>
<p>In this way of message distribution, all messages of a stream are sent to
all instances of a destination Processor. Following figure illustrates this
distribution process.
<img src="images/SAMOA%20Explain%20All%20Shuffling.png" alt="SAMOA Explain All
Shuffling">
Following code snippet shows how to connect a stream to a destination using
All-based distribution.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">builder.connectInputAllStream(sourceDataStream,
destinationProcessor);
+<div class="highlight"><pre><code class="language-"
data-lang="">builder.connectInputAllStream(sourceDataStream,
destinationProcessor);
</code></pre></div>
</article>
Modified: incubator/samoa/site/documentation/Task.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Task.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Task.html (original)
+++ incubator/samoa/site/documentation/Task.html Sun Jan 31 12:31:17 2016
@@ -75,8 +75,8 @@
<article class="post-content">
<p>Task is similar to a job in Hadoop. Task is an execution entity. A
topology must be defined inside a task. SAMOA can only execute classes that
implement <code>Task</code> interface.</p>
-<h3 id="1.-implementation">1. Implementation</h3>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">package com.yahoo.labs.samoa.tasks;
+<h3 id="1-implementation">1. Implementation</h3>
+<div class="highlight"><pre><code class="language-" data-lang="">package
com.yahoo.labs.samoa.tasks;
import com.yahoo.labs.samoa.topology.ComponentFactory;
import com.yahoo.labs.samoa.topology.Topology;
@@ -110,17 +110,17 @@ public interface Task {
public void setFactory(ComponentFactory factory) ;
}
</code></pre></div>
-<h3 id="2.-methods">2. Methods</h3>
+<h3 id="2-methods">2. Methods</h3>
-<h5 id="2.1-void-init()">2.1 <code>void init()</code></h5>
+<h5 id="2-1-void-init">2.1 <code>void init()</code></h5>
<p>This method should build the desired topology by creating Processors and
Streams and connecting them to each other.</p>
-<h5 id="2.2-topology-gettopology()">2.2 <code>Topology
getTopology()</code></h5>
+<h5 id="2-2-topology-gettopology">2.2 <code>Topology getTopology()</code></h5>
<p>This method should return the topology built by <code>init</code> to the
engine for execution.</p>
-<h5 id="2.3-void-setfactory(componentfactory-factory)">2.3 <code>void
setFactory(ComponentFactory factory)</code></h5>
+<h5 id="2-3-void-setfactory-componentfactory-factory">2.3 <code>void
setFactory(ComponentFactory factory)</code></h5>
<p>Utility method to accept a <code>ComponentFactory</code> to use in building
the topology.</p>
Modified: incubator/samoa/site/documentation/Topology-Builder.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Topology-Builder.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Topology-Builder.html (original)
+++ incubator/samoa/site/documentation/Topology-Builder.html Sun Jan 31
12:31:17 2016
@@ -74,8 +74,8 @@
<article class="post-content">
<p><code>TopologyBuilder</code> is a builder class which builds physical
units of the topology and assemble them together. Each topology has a name.
Following code snippet shows all the steps of creating a topology with one
<code>EntrancePI</code>, two PIs and a few streams.</p>
-<div class="highlight"><pre><code class="language-text"
data-lang="text">TopologyBuilder builder = new TopologyBuilder(factory) //
ComponentFactory factory
-builder.initTopology("Parma Topology"); //initiates an empty
topology with a name
+<div class="highlight"><pre><code class="language-"
data-lang="">TopologyBuilder builder = new TopologyBuilder(factory) //
ComponentFactory factory
+builder.initTopology("Parma Topology"); //initiates an empty topology with a
name
//********************************Topology
building***********************************
StreamSource sourceProcessor = new
StreamSource(inputPath,d,sampleSize,fpmGap,epsilon,phi,numSamples);
builder.addEntranceProcessor(sourceProcessor);
Modified:
incubator/samoa/site/documentation/Vertical-Hoeffding-Tree-Classifier.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Vertical-Hoeffding-Tree-Classifier.html?rev=1727802&r1=1727801&r2=1727802&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Vertical-Hoeffding-Tree-Classifier.html
(original)
+++ incubator/samoa/site/documentation/Vertical-Hoeffding-Tree-Classifier.html
Sun Jan 31 12:31:17 2016
@@ -75,7 +75,7 @@
<article class="post-content">
<p>Vertical Hoeffding Tree (VHT) classifier is a distributed classifier
that utilizes vertical parallelism on top of the Very Fast Decision Tree (VFDT)
or Hoeffding Tree classifier.</p>
-<h3 id="very-fast-decision-tree-(vfdt)-classifier">Very Fast Decision Tree
(VFDT) classifier</h3>
+<h3 id="very-fast-decision-tree-vfdt-classifier">Very Fast Decision Tree
(VFDT) classifier</h3>
<p><a href="http://doi.acm.org/10.1145/347090.347107">Hoeffding Tree or
VFDT</a> is the standard decision tree algorithm for data stream
classification. VFDT uses the Hoeffding bound to decide the minimum number of
arriving instances to achieve certain level of confidence in splitting the
node. This confidence level determines how close the statistics between the
attribute chosen by VFDT and the attribute chosen by decision tree for batch
learning.</p>
@@ -87,7 +87,7 @@
<p>For more explanation about available parallelism types for decision tree
induction, you can read chapter 4 of <a
href="../SAMOA-Developers-Guide-0-0-1.pdf">Distributed Decision Tree Learning
for Mining Big Data Streams</a>, the Developer's Guide of SAMOA. </p>
-<h3 id="vertical-hoeffding-tree-(vht)-classifier">Vertical Hoeffding Tree
(VHT) classifier</h3>
+<h3 id="vertical-hoeffding-tree-vht-classifier">Vertical Hoeffding Tree (VHT)
classifier</h3>
<p>VHT is implemented using the SAMOA API. The diagram below shows the
implementation:
<img src="images/VHT.png" alt="Vertical Hoeffding Tree"></p>