Author: nkourtellis
Date: Sun Sep 25 20:39:59 2016
New Revision: 1762231
URL: http://svn.apache.org/viewvc?rev=1762231&view=rev
Log:
updated for new release version
Added:
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html
Modified:
incubator/samoa/site/documentation/Adaptive-Model-Rules-Regressor.html
incubator/samoa/site/documentation/Bagging-and-Boosting.html
incubator/samoa/site/documentation/Building-SAMOA.html
incubator/samoa/site/documentation/Content-Event.html
incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html
incubator/samoa/site/documentation/Distributed-Stream-Clustering.html
incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html
incubator/samoa/site/documentation/Getting-Started.html
incubator/samoa/site/documentation/Home.html
incubator/samoa/site/documentation/Learner.html
incubator/samoa/site/documentation/Prequential-Evaluation-Task.html
incubator/samoa/site/documentation/Processing-Item.html
incubator/samoa/site/documentation/Processor.html
incubator/samoa/site/documentation/SAMOA-for-MOA-users.html
incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html
incubator/samoa/site/documentation/Stream.html
incubator/samoa/site/documentation/Task.html
incubator/samoa/site/documentation/Team.html
incubator/samoa/site/documentation/Topology-Builder.html
incubator/samoa/site/index.html
Modified: incubator/samoa/site/documentation/Adaptive-Model-Rules-Regressor.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Adaptive-Model-Rules-Regressor.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Adaptive-Model-Rules-Regressor.html
(original)
+++ incubator/samoa/site/documentation/Adaptive-Model-Rules-Regressor.html Sun
Sep 25 20:39:59 2016
@@ -74,41 +74,37 @@
<article class="post-content">
<h3 id="adaptive-model-rules-regressor">Adaptive Model Rules Regressor</h3>
-<p><a
href="http://www.ecmlpkdd2013.org/wp-content/uploads/2013/07/251.pdf">Adaptive
Model Rules (AMRules)</a> is an innovative algorithm for learning regression
rules with streaming data. In AMRules, the rule model consists of a set of
normal rules and a default rule (a rule with no features). Hoeffding bound is
used to define a confidence interval to decide whether to expand a rule. If the
ratio of the 2 largest standard deviation reduction (SDR) measure among all
potential features of a rule is is within this interval, the feature with the
largest SDR will be added to the rule to expand it. If the default rule is
expanded, it will become a normal rule and will be added to the modelâs rule
set. A new default rule is initialized to replace the expanded one. A rule in
the set might also be removed if the Page-Hinckley test indicates that its
cumulative error exceed a threshold.</p>
+
+<p><a
href="http://www.ecmlpkdd2013.org/wp-content/uploads/2013/07/251.pdf">Adaptive
Model Rules (AMRules)</a> is an innovative algorithm for learning regression
rules with streaming data. In AMRules, the rule model consists of a set of
normal rules and a default rule (a rule with no features). Hoeffding bound is
used to define a confidence interval to decide whether to expand a rule. If the
ratio of the 2 largest standard deviation reduction (SDR) measure among all
potential features of a rule is is within this interval, the feature with the
largest SDR will be added to the rule to expand it. If the default rule is
expanded, it will become a normal rule and will be added to the model's
rule set. A new default rule is initialized to replace the expanded one. A rule
in the set might also be removed if the Page-Hinckley test indicates that its
cumulative error exceed a threshold.</p>
<h3 id="vertical-adaptive-model-rules-regressor">Vertical Adaptive Model Rules
Regressor</h3>
+
<p>Vertical Adaptive Model Rules Regressor (VAMR) is the vertical parallel
implementation of AMRules in SAMOA. The diagram below shows the components of
the implementation.
-<img src="images/vamr.png" alt="Vertical AMRules" /></p>
+<img src="images/vamr.png" alt="Vertical AMRules"></p>
<p>The <em>Source PI</em> and <em>Evaluator PI</em> are components of the <a
href="Prequential-Evaluation-Task.html">Prequential Evaluation task</a>. The
<em>Source PI</em> produces the incoming instances while <em>Evaluator PI</em>
reads prediction results from VAMR and reports their accuracy and
throughput.</p>
-<p>The core of VAMR implementation consists of one <em>Model Aggregator
PI</em> and multiple <em>Learner PIs</em>. Each <em>Learner PI</em> is
responsible for training a subset of rules. The <em>Model Aggregator PI</em>
manages the rule model (rule set and default rule) to compute the prediction
results for incoming instances. It is also responsible for the training the
default rule and creation of new rules.</p>
+<p>The core of VAMR implementation consists of one <em>Model Aggregator
PI</em> and multiple <em>Learner PIs</em>. Each <em>Learner PI</em> is
responsible for training a subset of rules. The <em>Model Aggregator PI</em>
manages the rule model (rule set and default rule) to compute the prediction
results for incoming instances. It is also responsible for the training the
default rule and creation of new rules. </p>
<p>For each incoming instance from <em>Source PI</em>, <em>Model Aggregator
PI</em> appies the current rule set to compute the prediction. The instance is
also forwarded from <em>Model Aggregator PI</em> to the <em>Learner PI(s)</em>
to train those rules that cover this instance. If an instance is not covered by
any rule in the set, the default rule will be used for prediction and will also
be trained with this instance. When the default rule expands and create a new
rule, the new rule will be sent from <em>Model aggregator PI</em> to one of the
<em>Learner PIs</em>. When the <em>Learner PIs</em> expand or remove a rule, an
update message is also sent back to the <em>Model Aggregator PI</em>.</p>
-<p>The number of <em>Learner PIs</em> can be set with the <code
class="highlighter-rouge">-p</code> option:</p>
-
-<p><code class="highlighter-rouge">
-PrequentialEvaluationTask -l
(org.apache.samoa.learners.classifiers.rules.VerticalAMRulesRegressor -p 4)
-</code></p>
-
+<p>The number of <em>Learner PIs</em> can be set with the <code>-p</code>
option:</p>
+<div class="highlight"><pre><code class="language-"
data-lang="">PrequentialEvaluationTask -l
(org.apache.samoa.learners.classifiers.rules.VerticalAMRulesRegressor -p 4)
+</code></pre></div>
<h3 id="horizontal-adaptive-model-rules-regressor">Horizontal Adaptive Model
Rules Regressor</h3>
+
<p>Horizontal Adaptive Model Rules Regressor (HAMR) is an extended
implementation of VAMR. The components of a [[Prequential Evaluation
task|Prequential Evaluation Task]] with HAMR are shown in the diagram below.
-<img src="images/hamr.png" alt="Horizontal AMRules" /></p>
+<img src="images/hamr.png" alt="Horizontal AMRules"></p>
-<p>In HAMR, the <em>Model Aggregator PI</em> is replicated, each processes
only a partition of the incoming stream from <em>Source PI</em>. The default
rule is moved from the <em>Model Aggregator PI</em> to a special <em>Learner
PI</em>, called <em>Default Rule Learner PI</em>. This new PI is reposible for
both the training and predicting steps for default rule.</p>
+<p>In HAMR, the <em>Model Aggregator PI</em> is replicated, each processes
only a partition of the incoming stream from <em>Source PI</em>. The default
rule is moved from the <em>Model Aggregator PI</em> to a special <em>Learner
PI</em>, called <em>Default Rule Learner PI</em>. This new PI is reposible for
both the training and predicting steps for default rule. </p>
-<p>For each incoming instance from <em>Source PI</em>, <em>Model Aggregator
PIs</em> apply the current rule set to compute the prediction. If the instance
is covered by a rule in the set, its prediction is computed by the <em>Model
Aggregator PI</em> and, then, it is forwarded to the <em>Learner PI(s)</em> for
training. Otherwise, the instance is forwarded to <em>Default Rule Learner
PI</em> for both prediction and training.</p>
+<p>For each incoming instance from <em>Source PI</em>, <em>Model Aggregator
PIs</em> apply the current rule set to compute the prediction. If the instance
is covered by a rule in the set, its prediction is computed by the <em>Model
Aggregator PI</em> and, then, it is forwarded to the <em>Learner PI(s)</em> for
training. Otherwise, the instance is forwarded to <em>Default Rule Learner
PI</em> for both prediction and training. </p>
<p>Newly created rules are sent from <em>Default Rule Learner PI</em> to all
<em>Model Aggregator PIs</em> and one of the <em>Learner PIs</em>. Update
messages are also sent from <em>Learner PIs</em> to all <em>Model Aggregator
PIs</em> when a rule is expanded or removed.</p>
-<p>The number of <em>Learner PIs</em> can be set with the <code
class="highlighter-rouge">-p</code> option and the number of <em>Model
Aggregator PIs</em> can be set with the <code
class="highlighter-rouge">-r</code> option:</p>
-
-<p><code class="highlighter-rouge">
-PrequentialEvaluationTask -l
(org.apache.samoa.learners.classifiers.rules.HorizontalAMRulesRegressor -r 4 -p
2)
-</code></p>
-
-
+<p>The number of <em>Learner PIs</em> can be set with the <code>-p</code>
option and the number of <em>Model Aggregator PIs</em> can be set with the
<code>-r</code> option:</p>
+<div class="highlight"><pre><code class="language-"
data-lang="">PrequentialEvaluationTask -l
(org.apache.samoa.learners.classifiers.rules.HorizontalAMRulesRegressor -r 4 -p
2)
+</code></pre></div>
</article>
<!-- </div> -->
Modified: incubator/samoa/site/documentation/Bagging-and-Boosting.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Bagging-and-Boosting.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Bagging-and-Boosting.html (original)
+++ incubator/samoa/site/documentation/Bagging-and-Boosting.html Sun Sep 25
20:39:59 2016
@@ -79,28 +79,34 @@
It is possible to use the classifiers available in <a
href="http://moa.cms.waikato.ac.nz">MOA</a> by using the <a
href="https://github.com/samoa-moa/samoa-moa">SAMOA-MOA</a> adapter.</p>
<h3 id="bagging">Bagging</h3>
-<p>You can use Bagging as a SAMOA learner, specifying the number of learners
to use with parameter <code class="highlighter-rouge">-s</code> and the base
learner to use with parameter <code class="highlighter-rouge">-l</code></p>
-<p><code class="highlighter-rouge">(classifiers.ensemble.Bagging -s 10 -l
(classifiers.trees.VerticalHoeffdingTree))</code></p>
+<p>You can use Bagging as a SAMOA learner, specifying the number of learners
to use with parameter <code>-s</code> and the base learner to use with
parameter <code>-l</code></p>
+
+<p><code>(classifiers.ensemble.Bagging -s 10 -l
(classifiers.trees.VerticalHoeffdingTree))</code></p>
<h6 id="only-with-samoa-moa-adapter">Only with SAMOA-MOA adapter</h6>
-<p><code class="highlighter-rouge">(classifiers.ensemble.Bagging -s 10 -l
(classifiers.SingleClassifier -l (MOAClassifierAdapter -l
moa.classifiers.trees.HoeffdingTree)))</code></p>
+
+<p><code>(classifiers.ensemble.Bagging -s 10 -l (classifiers.SingleClassifier
-l (MOAClassifierAdapter -l moa.classifiers.trees.HoeffdingTree)))</code></p>
<h3 id="adaptive-bagging">Adaptive Bagging</h3>
+
<p>If data is evolving, it is better to use an adaptive version of bagging,
where each base learner has a change detector that monitors its accuracy. When
the accuracy of a base learner decreases, a new base learner is built to
replace it.</p>
-<p><code class="highlighter-rouge">(classifiers.ensemble.AdaptiveBagging -s 10
-l (classifiers.trees.VerticalHoeffdingTree))</code></p>
+<p><code>(classifiers.ensemble.AdaptiveBagging -s 10 -l
(classifiers.trees.VerticalHoeffdingTree))</code></p>
+
+<h6 id="only-with-samoa-moa-adapter">Only with SAMOA-MOA adapter</h6>
-<h6 id="only-with-samoa-moa-adapter-1">Only with SAMOA-MOA adapter</h6>
-<p><code class="highlighter-rouge">(classifiers.ensemble.AdaptiveBagging -s 10
-l (classifiers.SingleClassifier -l
(org.apache.samoa.learners.classifiers.MOAClassifierAdapter -l
moa.classifiers.trees.HoeffdingTree)))</code></p>
+<p><code>(classifiers.ensemble.AdaptiveBagging -s 10 -l
(classifiers.SingleClassifier -l
(org.apache.samoa.learners.classifiers.MOAClassifierAdapter -l
moa.classifiers.trees.HoeffdingTree)))</code></p>
<h3 id="boosting">Boosting</h3>
+
<p>Boosting is a well known ensemble method, that has a very good performance
in non-streaming setting. SAMOA implements the version of Oza and Russel
(<em>Nikunj C. Oza, Stuart J. Russell: Experimental comparisons of online and
batch versions of bagging and boosting. KDD 2001:359-364</em>)</p>
-<p><code class="highlighter-rouge">(classifiers.ensemble.Boosting -s 10 -l
(classifiers.trees.VerticalHoeffdingTree))</code></p>
+<p><code>(classifiers.ensemble.Boosting -s 10 -l
(classifiers.trees.VerticalHoeffdingTree))</code></p>
+
+<h6 id="only-with-samoa-moa-adapter">Only with SAMOA-MOA adapter</h6>
-<h6 id="only-with-samoa-moa-adapter-2">Only with SAMOA-MOA adapter</h6>
-<p><code class="highlighter-rouge">(classifiers.ensemble.Boosting -s 10 -l
(classifiers.SingleClassifier -l (MOAClassifierAdapter -l
moa.classifiers.trees.HoeffdingTree)))</code></p>
+<p><code>(classifiers.ensemble.Boosting -s 10 -l (classifiers.SingleClassifier
-l (MOAClassifierAdapter -l moa.classifiers.trees.HoeffdingTree)))</code></p>
</article>
Modified: incubator/samoa/site/documentation/Building-SAMOA.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Building-SAMOA.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Building-SAMOA.html (original)
+++ incubator/samoa/site/documentation/Building-SAMOA.html Sun Sep 25 20:39:59
2016
@@ -74,27 +74,23 @@
<article class="post-content">
<p>To build SAMOA to run on local mode, on your own computer without a
cluster, is simple as cloning the repository and installing it.</p>
-
-<p><code class="highlighter-rouge">bash
-git clone http://git.apache.org/incubator-samoa.git
-cd incubator-samoa
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">git
clone http://git.apache.org/incubator-samoa.git
+<span class="nb">cd </span>incubator-samoa
mvn package
-</code>
-The deployable jar for SAMOA will be in <code
class="highlighter-rouge">target/SAMOA-Local-0.3.0-SNAPSHOT.jar</code>.</p>
+</code></pre></div>
+<p>The deployable jar for SAMOA will be in
<code>target/SAMOA-Local-0.3.0-SNAPSHOT.jar</code>.</p>
<h3 id="storm">Storm</h3>
-<p>Simply clone the repository and install SAMOA.</p>
-<p><code class="highlighter-rouge">bash
-git clone http://git.apache.org/incubator-samoa.git
-cd incubator-samoa
+<p>Simply clone the repository and install SAMOA.</p>
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">git
clone http://git.apache.org/incubator-samoa.git
+<span class="nb">cd </span>incubator-samoa
mvn -Pstorm package
-</code></p>
-
-<p>The deployable jar for SAMOA will be in <code
class="highlighter-rouge">target/SAMOA-Storm-0.3.0-SNAPSHOT.jar</code>.</p>
+</code></pre></div>
+<p>The deployable jar for SAMOA will be in
<code>target/SAMOA-Storm-0.3.0-SNAPSHOT.jar</code>.</p>
<ul>
- <li><a href="Executing-SAMOA-with-Apache-Storm.html">1.1 Executing SAMOA
with Apache Storm</a></li>
+<li><a href="Executing-SAMOA-with-Apache-Storm.html">1.1 Executing SAMOA with
Apache Storm</a></li>
</ul>
<h3 id="s4">S4</h3>
@@ -102,19 +98,16 @@ mvn -Pstorm package
<p>If you want to compile SAMOA for Apache S4, you will need to install the S4
dependencies manually as explained in <a
href="Executing-SAMOA-with-Apache-S4.html">Executing SAMOA with Apache
S4</a>.</p>
<p>Once the dependencies are installed, you can simply clone the repository
and install SAMOA.</p>
-
-<p>```bash
-git clone http://git.apache.org/incubator-samoa.git
-cd incubator-samoa
-mvn -P<variant> package # where variant is "storm" or "s4"</variant></p>
-
-<p>mvn -Pstorm,s4 package # e.g., to get both versions
-```</p>
-
-<p>The deployable jars for SAMOA will be in <code
class="highlighter-rouge">target/SAMOA-<variant>-<version>-SNAPSHOT.jar</code>.
For example, for S4 <code
class="highlighter-rouge">target/SAMOA-S4-0.3.0-SNAPSHOT.jar</code>.</p>
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">git
clone http://git.apache.org/incubator-samoa.git
+<span class="nb">cd </span>incubator-samoa
+mvn -P<variant> package <span class="c"># where variant is "storm" or
"s4"</span>
+
+mvn -Pstorm,s4 package <span class="c"># e.g., to get both versions</span>
+</code></pre></div>
+<p>The deployable jars for SAMOA will be in
<code>target/SAMOA-<variant>-<version>-SNAPSHOT.jar</code>. For
example, for S4 <code>target/SAMOA-S4-0.3.0-SNAPSHOT.jar</code>.</p>
<ul>
- <li><a href="Executing-SAMOA-with-Apache-S4.html">1.2 Executing SAMOA with
Apache S4</a></li>
+<li><a href="Executing-SAMOA-with-Apache-S4.html">1.2 Executing SAMOA with
Apache S4</a></li>
</ul>
</article>
Modified: incubator/samoa/site/documentation/Content-Event.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Content-Event.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Content-Event.html (original)
+++ incubator/samoa/site/documentation/Content-Event.html Sun Sep 25 20:39:59
2016
@@ -75,120 +75,121 @@
<article class="post-content">
<p>A message or an event is called Content Event in SAMOA. As the name
suggests, it is an event which contains content which needs to be processed by
the processors.</p>
-<h3 id="implementation">1. Implementation</h3>
-<p>ContentEvent has been implemented as an interface in SAMOA. Users need to
implement <code class="highlighter-rouge">ContentEvent</code> interface to
create their custom message classes. As it can be seen in the following code,
key is the necessary part of a message.</p>
+<h3 id="1-implementation">1. Implementation</h3>
-<p>```
-package org.apache.samoa.core;</p>
+<p>ContentEvent has been implemented as an interface in SAMOA. Users need to
implement <code>ContentEvent</code> interface to create their custom message
classes. As it can be seen in the following code, key is the necessary part of
a message.</p>
+<div class="highlight"><pre><code class="language-" data-lang="">package
org.apache.samoa.core;
-<p>public interface ContentEvent extends java.io.Serializable {</p>
+public interface ContentEvent extends java.io.Serializable {
-<div class="highlighter-rouge"><pre class="highlight"><code>public String
getKey();
+ public String getKey();
-public void setKey(String str);
+ public void setKey(String str);
-public boolean isLastEvent(); } ``` ###2. Methods Following is a brief
description of methods.
-</code></pre>
-</div>
+ public boolean isLastEvent();
+}
+</code></pre></div>
+<h3 id="2-methods">2. Methods</h3>
+
+<p>Following is a brief description of methods.</p>
+
+<h5 id="2-1-string-getkey">2.1 <code>String getKey()</code></h5>
-<h5 id="string-getkey">2.1 <code class="highlighter-rouge">String
getKey()</code></h5>
<p>Each message is identified by a key in SAMOA. All user-defined message
classes should have a key state variable. Each instance of the custom message
should be assigned a key. This method should return the key of the respective
message.</p>
-<h5 id="void-setkeystring-str">2.2 <code class="highlighter-rouge">void
setKey(String str)</code></h5>
+<h5 id="2-2-void-setkey-string-str">2.2 <code>void setKey(String
str)</code></h5>
+
<p>This method is used to assign a key to the message.</p>
-<h5 id="boolean-islastevent">2.3 <code class="highlighter-rouge">boolean
isLastEvent()</code></h5>
+<h5 id="2-3-boolean-islastevent">2.3 <code>boolean isLastEvent()</code></h5>
+
<p>This method lets SAMOA know that this message is the last message.</p>
-<h3 id="example">3. Example</h3>
-<p>Following is the example of a <code
class="highlighter-rouge">Message</code> class which implements <code
class="highlighter-rouge">ContentEvent</code> interface. As <code
class="highlighter-rouge">ContentEvent</code> is an interface, it can not hold
variables. A user-defined message class should have its own data variables and
its getter methods. In the following example, <code
class="highlighter-rouge">value</code> variable of type <code
class="highlighter-rouge">Object</code> is added to the class. Using a generic
type <code class="highlighter-rouge">Object</code> is beneficial in the sense
that any object can be passed to it and later it can be casted back to the
original type. The following example also adds a <code
class="highlighter-rouge">streamId</code> variable which stores the <code
class="highlighter-rouge">id</code> of the stream the message belongs to. This
is not a requirement but can be beneficial in certain applications.</p>
+<h3 id="3-example">3. Example</h3>
-<p>```
-import org.apache.samoa.core.ContentEvent;</p>
+<p>Following is the example of a <code>Message</code> class which implements
<code>ContentEvent</code> interface. As <code>ContentEvent</code> is an
interface, it can not hold variables. A user-defined message class should have
its own data variables and its getter methods. In the following example,
<code>value</code> variable of type <code>Object</code> is added to the class.
Using a generic type <code>Object</code> is beneficial in the sense that any
object can be passed to it and later it can be casted back to the original
type. The following example also adds a <code>streamId</code> variable which
stores the <code>id</code> of the stream the message belongs to. This is not a
requirement but can be beneficial in certain applications.</p>
+<div class="highlight"><pre><code class="language-" data-lang="">import
org.apache.samoa.core.ContentEvent;
-<p>/**
+/**
* A general key-value message class which adds a stream id in the class
variables
* Stream id information helps in determining to which stream does the message
belongs to.
*/
-public class Message implements ContentEvent {</p>
-
-<div class="highlighter-rouge"><pre class="highlight"><code>/**
- * To tell if the message is the last message of the stream. This may be
required in some applications where
- * a stream can cease to exist
- */
-private boolean last=false;
-/**
- * Id of the stream to which the message belongs
- */
-private String streamId;
-/**
- * The key of the message. Can be any sting value. Duplicates are allowed.
- */
-private String key;
-/**
- * The value of the message. Can be any object. Casting may be necessary to
the desired type.
- */
-private Object value;
+public class Message implements ContentEvent {
-public Message()
-{}
-
-/**
- * @param key
- * @param value
- * @param isLastEvent
- * @param streamId
- */
-public Message(String key, Object value, boolean isLastEvent, String streamId)
-{
- this.key=key;
- this.value = value;
- this.last = isLastEvent;
- this.streamId=streamId;
-}
+ /**
+ * To tell if the message is the last message of the stream. This may be
required in some applications where
+ * a stream can cease to exist
+ */
+ private boolean last=false;
+ /**
+ * Id of the stream to which the message belongs
+ */
+ private String streamId;
+ /**
+ * The key of the message. Can be any sting value. Duplicates are allowed.
+ */
+ private String key;
+ /**
+ * The value of the message. Can be any object. Casting may be necessary
to the desired type.
+ */
+ private Object value;
+
+ public Message()
+ {}
+
+ /**
+ * @param key
+ * @param value
+ * @param isLastEvent
+ * @param streamId
+ */
+ public Message(String key, Object value, boolean isLastEvent, String
streamId)
+ {
+ this.key=key;
+ this.value = value;
+ this.last = isLastEvent;
+ this.streamId=streamId;
+ }
+
+ @Override
+ public String getKey() {
+ return key;
+ }
+
+ @Override
+ public void setKey(String str) {
+ this.key = str;
+ }
+
+ @Override
+ public boolean isLastEvent() {
+ return last;
+ }
+
+ /**
+ * @return value of the message
+ */
+ public String getValue()
+ {
+ return value.toString();
+ }
+
+ /**
+ * @return id of the stream to which the message belongs
+ */
+ public String getStreamId() {
+ return streamId;
+ }
+ /**
+ * @param streamId
+ */
+ public void setStreamId(String streamId) {
+ this.streamId = streamId;
+ }
-@Override
-public String getKey() {
- return key;
}
-@Override
-public void setKey(String str) {
- this.key = str;
-}
-
-@Override
-public boolean isLastEvent() {
- return last;
-}
-
-/**
- * @return value of the message
- */
-public String getValue()
-{
- return value.toString();
-}
-
-/**
- * @return id of the stream to which the message belongs
- */
-public String getStreamId() {
- return streamId;
-}
-/**
- * @param streamId
- */
-public void setStreamId(String streamId) {
- this.streamId = streamId;
-}
-</code></pre>
-</div>
-
-<p>}</p>
-
-<p>```</p>
-
+</code></pre></div>
</article>
<!-- </div> -->
Modified: incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html
(original)
+++ incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html Sun
Sep 25 20:39:59 2016
@@ -73,165 +73,147 @@
</header>
<article class="post-content">
- <p>A <em>task</em> is a machine learning related activity such as a
specific evaluation for a classifier. For instance the <em>prequential
evaluation</em> task is a task that uses each instance first for testing and
then for training a model built using a specific classification algorithm. A
task corresponds to a topology in SAMOA.</p>
+ <p>A <em>task</em> is a machine learning related activity such as a
specific evaluation for a classifier. For instance the <em>prequential
evaluation</em> task is a task that uses each instance first for testing and
then for training a model built using a specific classification algorithm. A
task corresponds to a topology in SAMOA. </p>
<p>In this tutorial, we will develop a simple Hello World task.</p>
<h3 id="hello-world-task">Hello World Task</h3>
+
<p>The Hello World task consists of a source processor, a destination
processor with a parallelism hint setting, and a stream that connects the two.
The source processor will generate a random integer which will be sent to the
destination processor. The figure below shows the layout of Hello World
task.</p>
-<p><img src="images/HelloWorldTask.png" alt="Hello World Task" /></p>
+<p><img src="images/HelloWorldTask.png" alt="Hello World Task"></p>
-<p>To develop the task, we create a new class that implements the interface
<code class="highlighter-rouge">org.apache.samoa.tasks.Task</code>. For
convenience we also implement <code
class="highlighter-rouge">com.github.javacliparser.Configurable</code> which
allows to parse command-line options.</p>
+<p>To develop the task, we create a new class that implements the interface
<code>org.apache.samoa.tasks.Task</code>. For convenience we also implement
<code>com.github.javacliparser.Configurable</code> which allows to parse
command-line options.</p>
-<p>The <code class="highlighter-rouge">init</code> method builds the topology
by instantiating the necessary <code
class="highlighter-rouge">Processors</code>, <code
class="highlighter-rouge">Streams</code> and connecting the source processor
with the destination processor.</p>
+<p>The <code>init</code> method builds the topology by instantiating the
necessary <code>Processors</code>, <code>Streams</code> and connecting the
source processor with the destination processor.</p>
<h3 id="hello-world-source-processor">Hello World Source Processor</h3>
-<p>We need a source processor which is an instance of <code
class="highlighter-rouge">EntranceProcessor</code> to start a task in SAMOA. In
this tutorial, the source processor is <code
class="highlighter-rouge">HelloWorldSourceProcessor</code>.</p>
-<p>The SAMOA runtime invokes the <code
class="highlighter-rouge">nextEvent</code> method of <code
class="highlighter-rouge">EntranceProcessor</code> until its <code
class="highlighter-rouge">hasNext</code> method returns false. Each call to
<code class="highlighter-rouge">nextEvent</code> should return the next <code
class="highlighter-rouge">ContentEvent</code> to be sent to the topology. In
this tutorial, <code class="highlighter-rouge">HelloWorldSourceProcessor</code>
sends events of type <code
class="highlighter-rouge">HelloWorldContentEvent</code>.</p>
+<p>We need a source processor which is an instance of
<code>EntranceProcessor</code> to start a task in SAMOA. In this tutorial, the
source processor is <code>HelloWorldSourceProcessor</code>. </p>
-<p>Here is the relevant code in <code
class="highlighter-rouge">HelloWorldSourceProcessor</code>:</p>
+<p>The SAMOA runtime invokes the <code>nextEvent</code> method of
<code>EntranceProcessor</code> until its <code>hasNext</code> method returns
false. Each call to <code>nextEvent</code> should return the next
<code>ContentEvent</code> to be sent to the topology. In this tutorial,
<code>HelloWorldSourceProcessor</code> sends events of type
<code>HelloWorldContentEvent</code>.</p>
-<p>```
- private Random rnd;
+<p>Here is the relevant code in <code>HelloWorldSourceProcessor</code>:</p>
+<div class="highlight"><pre><code class="language-" data-lang=""> private
Random rnd;
private final long maxInst;
- private long count;</p>
-
-<div class="highlighter-rouge"><pre class="highlight"><code>@Override
-public boolean hasNext() {
- return count < maxInst;
-}
+ private long count;
-@Override
-public ContentEvent nextEvent() {
- count++;
- return new HelloWorldContentEvent(rnd.nextInt(), false);
-} ```
-</code></pre>
-</div>
-
-<p>We also need to create a new type of <code
class="highlighter-rouge">ContentEvent</code> to hold our data. In this
tutorial we call it <code
class="highlighter-rouge">HelloWorldContentEvent</code> and its content is
simply an integer.</p>
-
-<p>```
-public class HelloWorldContentEvent implements ContentEvent {</p>
-
-<div class="highlighter-rouge"><pre class="highlight"><code>private static
final long serialVersionUID = -2406968925730298156L;
-private final boolean isLastEvent;
-private final int helloWorldData;
-
-public HelloWorldContentEvent(int helloWorldData, boolean isLastEvent) {
- this.isLastEvent = isLastEvent;
- this.helloWorldData = helloWorldData;
-}
+ @Override
+ public boolean hasNext() {
+ return count < maxInst;
+ }
-@Override
-public String getKey() {
- return null;
-}
+ @Override
+ public ContentEvent nextEvent() {
+ count++;
+ return new HelloWorldContentEvent(rnd.nextInt(), false);
+ }
+</code></pre></div>
+<p>We also need to create a new type of <code>ContentEvent</code> to hold our
data. In this tutorial we call it <code>HelloWorldContentEvent</code> and its
content is simply an integer.</p>
+<div class="highlight"><pre><code class="language-" data-lang="">public class
HelloWorldContentEvent implements ContentEvent {
+
+ private static final long serialVersionUID = -2406968925730298156L;
+ private final boolean isLastEvent;
+ private final int helloWorldData;
+
+ public HelloWorldContentEvent(int helloWorldData, boolean isLastEvent) {
+ this.isLastEvent = isLastEvent;
+ this.helloWorldData = helloWorldData;
+ }
-@Override
-public void setKey(String str) {
- // do nothing, it's key-less content event
-}
+ @Override
+ public String getKey() {
+ return null;
+ }
-@Override
-public boolean isLastEvent() {
- return isLastEvent;
-}
+ @Override
+ public void setKey(String str) {
+ // do nothing, it's key-less content event
+ }
-public int getHelloWorldData() {
- return helloWorldData;
-}
+ @Override
+ public boolean isLastEvent() {
+ return isLastEvent;
+ }
-@Override
-public String toString() {
- return "HelloWorldContentEvent [helloWorldData=" + helloWorldData + "]";
-} } ```
-</code></pre>
-</div>
+ public int getHelloWorldData() {
+ return helloWorldData;
+ }
+ @Override
+ public String toString() {
+ return "HelloWorldContentEvent [helloWorldData=" + helloWorldData +
"]";
+ }
+}
+</code></pre></div>
<h3 id="hello-world-destination-processor">Hello World Destination
Processor</h3>
+
<p>The destination processor for SAMOA is pretty straightforward and it will
print the data from the event.</p>
+<div class="highlight"><pre><code class="language-" data-lang="">public class
HelloWorldDestinationProcessor implements Processor {
-<p>```
-public class HelloWorldDestinationProcessor implements Processor {</p>
+ private static final long serialVersionUID = -6042613438148776446L;
+ private int processorId;
-<div class="highlighter-rouge"><pre class="highlight"><code>private static
final long serialVersionUID = -6042613438148776446L;
-private int processorId;
+ @Override
+ public boolean process(ContentEvent event) {
+ System.out.println(processorId + ": " + event);
+ return true;
+ }
-@Override
-public boolean process(ContentEvent event) {
- System.out.println(processorId + ": " + event);
- return true;
-}
+ @Override
+ public void onCreate(int id) {
+ this.processorId = id;
+ }
-@Override
-public void onCreate(int id) {
- this.processorId = id;
+ @Override
+ public Processor newProcessor(Processor p) {
+ return new HelloWorldDestinationProcessor();
+ }
}
-
-@Override
-public Processor newProcessor(Processor p) {
- return new HelloWorldDestinationProcessor();
-} } ```
-</code></pre>
-</div>
-
+</code></pre></div>
<h3 id="putting-it-all-together">Putting It All Together</h3>
-<p>To put all the components together, we need to go back to class <code
class="highlighter-rouge">HelloWorldTask</code>. First, we need to implement
the code for setting up the <code
class="highlighter-rouge">TopologyBuilder</code>. This code is necessary to be
able to run on multiple platforms.</p>
-<p><code class="highlighter-rouge">
- @Override
+<p>To put all the components together, we need to go back to class
<code>HelloWorldTask</code>. First, we need to implement the code for setting
up the <code>TopologyBuilder</code>. This code is necessary to be able to run
on multiple platforms.</p>
+<div class="highlight"><pre><code class="language-" data-lang=""> @Override
public void setFactory(ComponentFactory factory) {
builder = new TopologyBuilder(factory);
logger.debug("Sucessfully instantiating TopologyBuilder");
builder.initTopology(evaluationNameOption.getValue());
logger.debug("Sucessfully initializing SAMOA topology with name {}",
evaluationNameOption.getValue());
}
-</code></p>
-
-<p>After this method is called we have a functioning builder to get components
for our topology. Next, the <code class="highlighter-rouge">init</code> method
is called by SAMOA to start the task.
-First we instantiate the source <code
class="highlighter-rouge">EntranceProcessor</code>.
-After adding the entrance processor to the topology, we create a stream
originating from it. We use the create stream method of <code
class="highlighter-rouge">TopologyBuilder</code>.
+</code></pre></div>
+<p>After this method is called we have a functioning builder to get components
for our topology. Next, the <code>init</code> method is called by SAMOA to
start the task.
+First we instantiate the source <code>EntranceProcessor</code>.
+After adding the entrance processor to the topology, we create a stream
originating from it. We use the create stream method of
<code>TopologyBuilder</code>.
Next we create the destination processor and connect it to the stream by using
shuffle grouping.
Once we have created all the components, we use the builder to build the
topology.</p>
-
-<p>```
- @Override
+<div class="highlight"><pre><code class="language-" data-lang=""> @Override
public void init() {
// create source EntranceProcesor
sourceProcessor = new
HelloWorldSourceProcessor(instanceLimitOption.getValue());
- builder.addEntranceProcessor(sourceProcessor);</p>
-
-<div class="highlighter-rouge"><pre class="highlight"><code> // create
Stream
- Stream stream = builder.createStream(sourceProcessor);
+ builder.addEntranceProcessor(sourceProcessor);
- // create destination Processor
- destProcessor = new HelloWorldDestinationProcessor();
- builder.addProcessor(destProcessor,
helloWorldParallelismOption.getValue());
- builder.connectInputShuffleStream(stream, destProcessor);
-
- // build the topology
- helloWorldTopology = builder.build();
- logger.debug("Successfully built the topology");
-} ```
-</code></pre>
-</div>
+ // create Stream
+ Stream stream = builder.createStream(sourceProcessor);
+ // create destination Processor
+ destProcessor = new HelloWorldDestinationProcessor();
+ builder.addProcessor(destProcessor,
helloWorldParallelismOption.getValue());
+ builder.connectInputShuffleStream(stream, destProcessor);
+
+ // build the topology
+ helloWorldTopology = builder.build();
+ logger.debug("Successfully built the topology");
+ }
+</code></pre></div>
<h3 id="running-it">Running It</h3>
-<p>To run the example in local mode:</p>
-
-<p><code class="highlighter-rouge">
-bin/samoa local target/SAMOA-Local-0.0.1-SNAPSHOT.jar
"org.apache.samoa.examples.HelloWorldTask -p 4 -i 100"
-</code></p>
+<p>To run the example in local mode:</p>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
local target/SAMOA-Local-0.0.1-SNAPSHOT.jar
"org.apache.samoa.examples.HelloWorldTask -p 4 -i 100"
+</code></pre></div>
<p>To run the example in Storm local mode:</p>
-
-<p><code class="highlighter-rouge">
-java -cp
$STORM_HOME/lib/*:$STORM_HOME/storm-0.8.2.jar:target/SAMOA-Storm-0.0.1-SNAPSHOT.jar
org.apache.samoa.LocalStormDoTask "org.apache.samoa.examples.HelloWorldTask -p
4 -i 1000"
-</code></p>
-
+<div class="highlight"><pre><code class="language-" data-lang="">java -cp
$STORM_HOME/lib/*:$STORM_HOME/storm-0.8.2.jar:target/SAMOA-Storm-0.0.1-SNAPSHOT.jar
org.apache.samoa.LocalStormDoTask "org.apache.samoa.examples.HelloWorldTask -p
4 -i 1000"
+</code></pre></div>
<p>All the code for the HelloWorldTask and its components can be found <a
href="https://github.com/yahoo/samoa/tree/master/samoa-api/src/main/java/org/apache/samoa/examples">here</a>.</p>
</article>
Modified: incubator/samoa/site/documentation/Distributed-Stream-Clustering.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Distributed-Stream-Clustering.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Distributed-Stream-Clustering.html
(original)
+++ incubator/samoa/site/documentation/Distributed-Stream-Clustering.html Sun
Sep 25 20:39:59 2016
@@ -74,24 +74,22 @@
<article class="post-content">
<h2 id="apache-samoa-clustering-algorithm">Apache SAMOA Clustering
Algorithm</h2>
-<p>The SAMOA Clustering Algorithm is invoked by using the <code
class="highlighter-rouge">ClusteringEvaluation</code> task. The clustering task
can be executed with default values just by running:</p>
-
-<p><code class="highlighter-rouge">
-bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "ClusteringEvaluation"
-</code></p>
+<p>The SAMOA Clustering Algorithm is invoked by using the
<code>ClusteringEvaluation</code> task. The clustering task can be executed
with default values just by running:</p>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "ClusteringEvaluation"
+</code></pre></div>
<p>Parameters:</p>
<ul>
- <li><code class="highlighter-rouge">-l</code>: clusterer to train</li>
- <li><code class="highlighter-rouge">-s</code>: stream to learn from</li>
- <li><code class="highlighter-rouge">-i</code>: maximum number of instances
to test/train on (-1 = no limit)</li>
- <li><code class="highlighter-rouge">-f</code>: how many instances between
samples of the learning performance</li>
- <li><code class="highlighter-rouge">-n</code>: evaluation name (default:
ClusteringEvaluation_TimeStamp)</li>
- <li><code class="highlighter-rouge">-d</code>: file to append intermediate
csv results to</li>
+<li><code>-l</code>: clusterer to train</li>
+<li><code>-s</code>: stream to learn from</li>
+<li><code>-i</code>: maximum number of instances to test/train on (-1 = no
limit)</li>
+<li><code>-f</code>: how many instances between samples of the learning
performance</li>
+<li><code>-n</code>: evaluation name (default:
ClusteringEvaluation_TimeStamp)</li>
+<li><code>-d</code>: file to append intermediate csv results to</li>
</ul>
-<p>In terms of the SAMOA API, Clustering Evaluation consists of a <code
class="highlighter-rouge">source</code> processor, a <code
class="highlighter-rouge">clusterer</code>, and a <code
class="highlighter-rouge">evaluator</code> processor. <code
class="highlighter-rouge">Source</code> processor sends the instances to the
classifier using <code class="highlighter-rouge">source</code> stream. The
clusterer sends the clustering results to the <code
class="highlighter-rouge">evaluator</code> processor via the <code
class="highlighter-rouge">result</code> stream. The <code
class="highlighter-rouge">source Processor</code> corresponds to the <code
class="highlighter-rouge">-s</code> option of Clustering Evaluation, and the
clusterer corresponds to the <code class="highlighter-rouge">-l</code>
option.</p>
+<p>In terms of the SAMOA API, Clustering Evaluation consists of a
<code>source</code> processor, a <code>clusterer</code>, and a
<code>evaluator</code> processor. <code>Source</code> processor sends the
instances to the classifier using <code>source</code> stream. The clusterer
sends the clustering results to the <code>evaluator</code> processor via the
<code>result</code> stream. The <code>source Processor</code> corresponds to
the <code>-s</code> option of Clustering Evaluation, and the clusterer
corresponds to the <code>-l</code> option.</p>
</article>
Modified:
incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
---
incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html
(original)
+++
incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html
Sun Sep 25 20:39:59 2016
@@ -73,73 +73,69 @@
</header>
<article class="post-content">
- <h2 id="introduction">1. Introduction</h2>
-<p>SAMOA takes a micro-batching approach to frequent itemset mining (FIM). It
uses <a href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a> as a base
algorithm for distributed sample-based frequent itemset mining. PARMA provides
the guaranty that all the frequent itemsets would be present in the result that
it returns.It also returns some false positives. The problem with FIM in
streams is that the stream has an evolving nature. The itemsets that were
frequent last year may not be frequent this year. To handle this, SAMOA
implements <a href="https://dl.acm.org/citation.cfm?id=1164180">Time Biased
Sampling</a> approach. This sampling method depends on a parameter
<em>lambda</em> which determines the size of the reservoir sample. This also
tells us how much biased the sample would be towards newer itemsets. As PARMA
has its own way of determining sample sizes, SAMOA does not allow users to
choose <em>lambda</em> and determines its value using the sample size
determined by PARMA
using the approximation <code class="highlighter-rouge">lambda =
1/sampleSize</code>.
-## 2. Concepts
-SAMOA implements FIM for streams in three processors i.e.
StreamSourceProcessor, SamplerProcessor and AggregatorProcessor. The tasks of
each of these are explained below.</p>
+ <h2 id="1-introduction">1. Introduction</h2>
+
+<p>SAMOA takes a micro-batching approach to frequent itemset mining (FIM). It
uses <a href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a> as a base
algorithm for distributed sample-based frequent itemset mining. PARMA provides
the guaranty that all the frequent itemsets would be present in the result that
it returns.It also returns some false positives. The problem with FIM in
streams is that the stream has an evolving nature. The itemsets that were
frequent last year may not be frequent this year. To handle this, SAMOA
implements <a href="https://dl.acm.org/citation.cfm?id=1164180">Time Biased
Sampling</a> approach. This sampling method depends on a parameter
<em>lambda</em> which determines the size of the reservoir sample. This also
tells us how much biased the sample would be towards newer itemsets. As PARMA
has its own way of determining sample sizes, SAMOA does not allow users to
choose <em>lambda</em> and determines its value using the sample size
determined by PARMA
using the approximation <code>lambda = 1/sampleSize</code>. </p>
+
+<h2 id="2-concepts">2. Concepts</h2>
+
+<p>SAMOA implements FIM for streams in three processors i.e.
StreamSourceProcessor, SamplerProcessor and AggregatorProcessor. The tasks of
each of these are explained below.</p>
<ol>
- <li>
- <p>StreamSourceP takes as input the input transaction file.
StreamSourceProcessor (Entrance PI) starts sending the transactions randomly to
SamplerProcessor instances. The number of SamplerProcessors to instantiate is
taken as an argument from the user but is verified by PARMA. PARMA determines
this number based on the <code class="highlighter-rouge">epsilon</code> and
<code class="highlighter-rouge">phi</code> parameters provided by the user.
StreamSourceProcessor sends an FPM=âyesâ command to all the instances of
SamplerProcessor after 2M transactions where M=numSamples*sampleSize. After
first FPM=âyesâ command, all later FPM=âyesâ commands are sent after
<code class="highlighter-rouge">fpmGap</code> transactions which is one of the
parameter SAMOA FIM task takes as input.</p>
- </li>
- <li>
- <p>All the instances of SamplerProcessor start building a Time Biased
Reservoir Sample in which newer transactions have more weight. Time biased
sampling is the default approach but user can provide his own sampler by
implementing <code
class="highlighter-rouge">samoa.samplers.SamplerInterface</code>. When a
SamplerProcessor receives FPM=âyesâ command, it starts FIM/FPM on the
reservoir irrespective of whether the reservoir is full or not. When it
completes, it sends the result item-sets to the AggregatorProcessor with the
epoch/batch id. At the end of the result, each SamplerProcessor sends the
(âepoch_endâ,<epochnum>) message to the AggregatorProcessor.</epochnum></p>
- </li>
- <li>
- <p>AggregatorProcessor receives the result item-sets from all
SamplerProcessors. It maintains different queues for different batch ids and
also maintains a count of the number of SamplerProcessors which have finished
sending their results for a corresponding batch/epoch. Whenever the <code
class="highlighter-rouge">epoch_end</code> message count becomes equal to the
number of instances of SampleProcessor, AggregatorProcessor aggregates the
results and stores it in the file system using the output path specified by the
user.</p>
- </li>
+<li><p>StreamSourceP takes as input the input transaction file.
StreamSourceProcessor (Entrance PI) starts sending the transactions randomly to
SamplerProcessor instances. The number of SamplerProcessors to instantiate is
taken as an argument from the user but is verified by PARMA. PARMA determines
this number based on the <code>epsilon</code> and <code>phi</code> parameters
provided by the user. StreamSourceProcessor sends an FPM='yes' command
to all the instances of SamplerProcessor after 2M transactions where
M=numSamples*sampleSize. After first FPM='yes' command, all later
FPM='yes' commands are sent after <code>fpmGap</code> transactions
which is one of the parameter SAMOA FIM task takes as input.</p></li>
+<li><p>All the instances of SamplerProcessor start building a Time Biased
Reservoir Sample in which newer transactions have more weight. Time biased
sampling is the default approach but user can provide his own sampler by
implementing <code>samoa.samplers.SamplerInterface</code>. When a
SamplerProcessor receives FPM='yes' command, it starts FIM/FPM on the
reservoir irrespective of whether the reservoir is full or not. When it
completes, it sends the result item-sets to the AggregatorProcessor with the
epoch/batch id. At the end of the result, each SamplerProcessor sends the
(âepoch_endâ,<epochNum>) message to the AggregatorProcessor.</p></li>
+<li><p>AggregatorProcessor receives the result item-sets from all
SamplerProcessors. It maintains different queues for different batch ids and
also maintains a count of the number of SamplerProcessors which have finished
sending their results for a corresponding batch/epoch. Whenever the
<code>epoch_end</code> message count becomes equal to the number of instances
of SampleProcessor, AggregatorProcessor aggregates the results and stores it in
the file system using the output path specified by the user.</p></li>
</ol>
-<p>In this way, epochs never overlap.If <code
class="highlighter-rouge">fpmGap</code> is small and the StreamSourceProcessor
dispatches an FPM=âyesâ command before the slowest SamplerProcessor
finishes FIM on the last epoch, the speed of the global FIM will be equal to
the local FIM of the slowest SamplerProcessor. (or AggregatorProcessor if it is
slower than the slowest SamplerProcessor)</p>
+<p>In this way, epochs never overlap.If <code>fpmGap</code> is small and the
StreamSourceProcessor dispatches an FPM='yes' command before the
slowest SamplerProcessor finishes FIM on the last epoch, the speed of the
global FIM will be equal to the local FIM of the slowest SamplerProcessor. (or
AggregatorProcessor if it is slower than the slowest SamplerProcessor)</p>
-<p><img src="images/SAMOA FIM.jpg" alt="SAMOA FIM" /></p>
+<p><img src="images/SAMOA%20FIM.jpg" alt="SAMOA FIM"></p>
-<h2 id="how-to-run">3. How to run</h2>
-<p>Following is an example of the command used to run the SAMOA FIM task.</p>
-
-<p><code class="highlighter-rouge">
-bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "FpmTask -t
Myfpmtopology -r (org.apache.samoa.fpm.processors.FileReaderProcessor -i
/datasets/freqDataCombined.txt) -m
(org.apache.samoa.fpm.processors.ParmaStreamFpmMiner -e .1 -d .1 -f 10 -t 20 -n
23 -p 0.08 -b 100000 -s
org.apache.samoa.samplers.reservoir.TimeBiasedReservoirSampler) -w
(org.apache.samoa.fpm.processors.FileWriterProcessor -o /output/outPARMA) "
-</code></p>
+<h2 id="3-how-to-run">3. How to run</h2>
+<p>Following is an example of the command used to run the SAMOA FIM task.</p>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "FpmTask -t Myfpmtopology -r
(org.apache.samoa.fpm.processors.FileReaderProcessor -i
/datasets/freqDataCombined.txt) -m
(org.apache.samoa.fpm.processors.ParmaStreamFpmMiner -e .1 -d .1 -f 10 -t 20 -n
23 -p 0.08 -b 100000 -s
org.apache.samoa.samplers.reservoir.TimeBiasedReservoirSampler) -w
(org.apache.samoa.fpm.processors.FileWriterProcessor -o /output/outPARMA) "
+</code></pre></div>
<p>Parameters:
To run an FIM task, four parameters are required</p>
<ul>
- <li><code class="highlighter-rouge">-t</code>: Topology name (Can be any
name)</li>
- <li><code class="highlighter-rouge">-r</code>: The reader class</li>
- <li><code class="highlighter-rouge">-m</code>: The miner class</li>
- <li><code class="highlighter-rouge">-w</code>: The writer class</li>
+<li><code>-t</code>: Topology name (Can be any name)</li>
+<li><code>-r</code>: The reader class</li>
+<li><code>-m</code>: The miner class</li>
+<li><code>-w</code>: The writer class</li>
</ul>
-<p>In the example above, <code
class="highlighter-rouge">FileReaderProcessor</code> is used as a reader class.
It takes only one parameter:</p>
+<p>In the example above, <code>FileReaderProcessor</code> is used as a reader
class. It takes only one parameter:</p>
<ul>
- <li><code class="highlighter-rouge">-i</code>: Path to input file</li>
+<li><code>-i</code>: Path to input file</li>
</ul>
-<p>Similarly, <code class="highlighter-rouge">FileWriterProcessor</code> is
used as a writer class. It takes only one parameter:</p>
+<p>Similarly, <code>FileWriterProcessor</code> is used as a writer class. It
takes only one parameter:</p>
<ul>
- <li><code class="highlighter-rouge">-o</code>: Path to output file</li>
+<li><code>-o</code>: Path to output file</li>
</ul>
-<p>SAMOA comes with a built-in distributed frequent mining algorithm PARMA as
described above but users can plug-in their custom miners by implementing the
<code class="highlighter-rouge">FpmMinerInterface</code>. The built-in PARMA
miner can be used with the following parameters:</p>
+<p>SAMOA comes with a built-in distributed frequent mining algorithm PARMA as
described above but users can plug-in their custom miners by implementing the
<code>FpmMinerInterface</code>. The built-in PARMA miner can be used with the
following parameters:</p>
<ul>
- <li><code class="highlighter-rouge">-e</code>: epsilon parameter for <a
href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a></li>
- <li><code class="highlighter-rouge">-d</code>: delta parameter for <a
href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a></li>
- <li><code class="highlighter-rouge">-f</code>: minimum frequency
(percentage) of a frequent itemset</li>
- <li><code class="highlighter-rouge">-t</code>: maximum length of a
transaction</li>
- <li><code class="highlighter-rouge">-n</code>: number of samples to
maintain</li>
- <li><code class="highlighter-rouge">-a</code>: number of aggregators to
initiate</li>
- <li><code class="highlighter-rouge">-p</code>: phi parameter for <a
href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a></li>
- <li><code class="highlighter-rouge">-i</code>: path to input file</li>
- <li><code class="highlighter-rouge">-o</code>: path to output file</li>
- <li><code class="highlighter-rouge">-b</code>: batch size or fpmGap (Number
of transactions after which FIM should be performed)</li>
- <li><code class="highlighter-rouge">-s</code>: Sampler Class to be used for
sampling at each node</li>
+<li><code>-e</code>: epsilon parameter for <a
href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a></li>
+<li><code>-d</code>: delta parameter for <a
href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a></li>
+<li><code>-f</code>: minimum frequency (percentage) of a frequent itemset</li>
+<li><code>-t</code>: maximum length of a transaction</li>
+<li><code>-n</code>: number of samples to maintain</li>
+<li><code>-a</code>: number of aggregators to initiate</li>
+<li><code>-p</code>: phi parameter for <a
href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a></li>
+<li><code>-i</code>: path to input file</li>
+<li><code>-o</code>: path to output file</li>
+<li><code>-b</code>: batch size or fpmGap (Number of transactions after which
FIM should be performed)</li>
+<li><code>-s</code>: Sampler Class to be used for sampling at each node</li>
</ul>
<h2 id="note">Note</h2>
+
<p>This method is currently unavailable in the master branch of SAMOA due to
licensing restriction.</p>
</article>
Added:
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html?rev=1762231&view=auto
==============================================================================
---
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html
(added)
+++
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html
Sun Sep 25 20:39:59 2016
@@ -0,0 +1,184 @@
+<!DOCTYPE html>
+<html>
+
+ <head>
+ <meta charset="utf-8">
+ <meta http-equiv="X-UA-Compatible" content="IE=edge">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+ <meta name="description" content="">
+ <meta name="author" content="">
+ <link rel="icon" href="/assets/favicon.ico">
+
+ <title>Executing Apache SAMOA with Apache Avro Files</title>
+
+ <!-- Bootstrap core CSS -->
+ <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
+ <!-- Bootstrap theme -->
+ <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
+
+ <!-- Custom styles for this template -->
+ <link href="/assets/css/theme.css" rel="stylesheet">
+
+ <link href="/css/main.css" rel="stylesheet">
+
+ <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
+ <!--[if lt IE 9]><script
src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
+ <script src="/assets/js/ie-emulation-modes-warning.js"></script>
+
+ <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media
queries -->
+ <!--[if lt IE 9]>
+ <script
src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
+ <script
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
+ <![endif]-->
+ </head>
+
+
+
+ <body>
+ <div class="container">
+ <!-- Fixed navbar -->
+ <nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+ <div class="container">
+ <div class="navbar-header">
+ <button type="button" class="navbar-toggle collapsed"
data-toggle="collapse" data-target="#navbar" aria-expanded="false"
aria-controls="navbar">
+ <span class="sr-only">Toggle navigation</span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ </button>
+ <a class="navbar-brand" href="/index.html">Apache SAMOA</a>
+ </div>
+ <div id="navbar" class="navbar-collapse collapse">
+ <ul class="nav navbar-nav">
+ <li><a href="/index.html">Home</a></li>
+ <li><a href="Home.html">Documentation</a></li>
+ <li><a href="api/current/index.html">API</a></li>
+ <li><a href="Team.html">Contributors</a></li>
+ <li><a href="Bylaws.html">Bylaws</a></li>
+ </ul>
+ </div><!--/.nav-collapse -->
+ </div>
+ </nav>
+
+
+
+
+
+ <!-- Documentation -->
+<!-- <div class="container"> -->
+
+ <header class="post-header">
+ <h1 class="post-title">Executing Apache SAMOA with Apache Avro Files</h1>
+ <p class="post-meta"></p>
+ </header>
+
+ <article class="post-content">
+ <p>In this tutorial page we describe how to execute SAMOA with data files
in Apache Avro file format. Here is an outline of this tutorial</p>
+
+<ol>
+<li>Overview of Apache Avro</li>
+<li>Avro Input Format for SAMOA</li>
+<li>SAMOA task execution with Avro</li>
+<li>Sample Avro Data for SAMOA</li>
+</ol>
+
+<h3 id="overview-of-apache-avro">Overview of Apache Avro</h3>
+
+<p>Users of Apache SAMOA can now use Binary/JSON encoded Avro data as an
alternate to the default ARFF file format as the data source. Avro is a remote
procedure call and data serialization framework developed within Apache's
Hadoop project. It uses JSON for defining data types and protocols, and
serializes data in a compact binary format. Avro specifies two serialization
encodings for the data: Binary and JSON, default being Binary. However the
meta-data is always in JSON. Avro data is always serialized with its schema.
Files that store Avro data should also include the schema for that data in the
same file. </p>
+
+<p>You can find the latest Apache Avro documentation <a
href="https://avro.apache.org/docs/current/">here</a> for more details.</p>
+
+<h3 id="avro-input-format-for-samoa">Avro Input Format for SAMOA</h3>
+
+<p>It is required that the input Avro files to the SAMOA framework follow
certain Input Format Rules to seamlessly work with the SAMOA Instances. The
first line of Avro Source file for SAMOA (irrespective of whether data is
encoded in binary or JSON) will be the metadata (schema). The data would be by
default one record per line following the schema and will be mapped into 1
SAMOA instance per record.</p>
+
+<ol>
+<li>Avro Primitive Types & Enums are allowed for the data as is. </li>
+<li>Avro Complex-types (e.g maps/arrays) may not be used with the exception of
enum & union. I.e. no sub-structure will be allowed.</li>
+<li>Label (if any) would be the last attribute.</li>
+<li>Timestamps are not supported as of now within SAMOA.</li>
+<li>Avro Enums may be used to represent nominal attributes.</li>
+<li>Avro unions may be used to represent nullability of value. However unions
may not be used for different data types.<br></li>
+</ol>
+<div class="highlight"><pre><code class="language-" data-lang="">E.g Enums
+{"name":"species","type":{"type":"enum","name":"Labels","symbols":["setosa","versicolor","virginica"]}}
+E.g Unions
+{"name":"attribute1","type":["null","int"]} -Allowed to denote that value for
attribute1 is optional
+{"name":" attribute2","type":["string","int"]} -Not allowed
+</code></pre></div>
+<h3 id="samoa-task-execution-with-avro">SAMOA task execution with Avro</h3>
+
+<p>You may execute a SAMOA task using the aforementioned
<code>bin/samoa</code> script with the following format: <code>bin/samoa
<platform> <jar> "<task>"</code>.
+Follow this <a href="Executing-SAMOA-with-Apache-S4">link</a> and this <a
href="Executing-SAMOA-with-Apache-Storm">link</a> to learn more about deploying
SAMOA on Apache S4 and Apache Storm respectively. The Avro files can be used as
data sources for any of the aforementioned platforms. The only addition that
needs to be made in the commands is as follows: <code>AvroFileStream
<file_name> -e <file_format></code> . Examples are given below for
different modes. Though the examples below use <a
href="Prequential-Evaluation-Task">Prequential Evaluation task</a> the commands
are applicable to all other tasks as well.</p>
+
+<h4 id="local-avro-json">Local - Avro JSON</h4>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_json.avro -e
json) -f 100000"
+</code></pre></div>
+<h4 id="local-avro-binary">Local - Avro Binary</h4>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_binary.avro
-e binary) -f 100000"
+</code></pre></div>
+<h4 id="storm-avro-json">Storm - Avro JSON</h4>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_json.avro -e
json) -f 100000"
+</code></pre></div>
+<h4 id="storm-avro-binary">Storm - Avro Binary</h4>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa
storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_binary.avro
-e binary) -f 100000"
+</code></pre></div>
+<h3 id="sample-avro-data-for-samoa">Sample Avro Data for SAMOA</h3>
+
+<p>The samples below describe how the default ARFF file formats may be
converted to JSON/Binary encoded Avro formats.</p>
+
+<h4 id="iris-dataset-default-arff-format">Iris Dataset - Default ARFF
Format</h4>
+<div class="highlight"><pre><code class="language-" data-lang="">@RELATION
iris
+@ATTRIBUTE sepallength NUMERIC
+@ATTRIBUTE sepalwidth NUMERIC
+@ATTRIBUTE petallength NUMERIC
+@ATTRIBUTE petalwidth NUMERIC
+@ATTRIBUTE class {setosa,versicolor,virginica}
+@DATA
+5.1,3.5,1.4,0.2,setosa
+4.9,3.0,1.4,0.2,virginica
+4.7,3.2,1.3,0.2,virginica
+4.6,3.1,1.5,0.2,setosa
+</code></pre></div>
+<h4 id="iris-dataset-json-encoded-avro-format">Iris Dataset - JSON Encoded
AVRO Format</h4>
+<div class="highlight"><pre><code class="language-" data-lang=""><span
class="p">{</span><span class="nt">"type"</span><span class="p">:</span><span
class="s2">"record"</span><span class="p">,</span><span
class="nt">"name"</span><span class="p">:</span><span
class="s2">"Iris"</span><span class="p">,</span><span
class="nt">"namespace"</span><span class="p">:</span><span
class="s2">"org.apache.samoa.avro.iris"</span><span class="p">,</span><span
class="nt">"fields"</span><span class="p">:[{</span><span
class="nt">"name"</span><span class="p">:</span><span
class="s2">"sepallength"</span><span class="p">,</span><span
class="nt">"type"</span><span class="p">:</span><span
class="s2">"double"</span><span class="p">},{</span><span
class="nt">"name"</span><span class="p">:</span><span
class="s2">"sepalwidth"</span><span class="p">,</span><span
class="nt">"type"</span><span class="p">:</span><span
class="s2">"double"</span><span class="p">},{</span><span
class="nt">"name"</span><span class="p
">:</span><span class="s2">"petallength"</span><span class="p">,</span><span
class="nt">"type"</span><span class="p">:</span><span
class="s2">"double"</span><span class="p">},{</span><span
class="nt">"name"</span><span class="p">:</span><span
class="s2">"petalwidth"</span><span class="p">,</span><span
class="nt">"type"</span><span class="p">:</span><span
class="s2">"double"</span><span class="p">},{</span><span
class="nt">"name"</span><span class="p">:</span><span
class="s2">"class"</span><span class="p">,</span><span
class="nt">"type"</span><span class="p">:{</span><span
class="nt">"type"</span><span class="p">:</span><span
class="s2">"enum"</span><span class="p">,</span><span
class="nt">"name"</span><span class="p">:</span><span
class="s2">"Labels"</span><span class="p">,</span><span
class="nt">"symbols"</span><span class="p">:[</span><span
class="s2">"setosa"</span><span class="p">,</span><span
class="s2">"versicolor"</span><span class="p">,</span><span
class="s2">"virginica"</sp
an><span class="p">]}}]}</span><span class="w">
+</span><span class="p">{</span><span class="nt">"sepallength"</span><span
class="p">:</span><span class="mf">5.1</span><span class="p">,</span><span
class="nt">"sepalwidth"</span><span class="p">:</span><span
class="mf">3.5</span><span class="p">,</span><span
class="nt">"petallength"</span><span class="p">:</span><span
class="mf">1.4</span><span class="p">,</span><span
class="nt">"petalwidth"</span><span class="p">:</span><span
class="mf">0.2</span><span class="p">,</span><span
class="nt">"class"</span><span class="p">:</span><span
class="s2">"setosa"</span><span class="p">}</span><span class="w">
+</span><span class="p">{</span><span class="nt">"sepallength"</span><span
class="p">:</span><span class="mf">3.0</span><span class="p">,</span><span
class="nt">"sepalwidth"</span><span class="p">:</span><span
class="mf">1.4</span><span class="p">,</span><span
class="nt">"petallength"</span><span class="p">:</span><span
class="mf">4.9</span><span class="p">,</span><span
class="nt">"petalwidth"</span><span class="p">:</span><span
class="mf">0.2</span><span class="p">,</span><span
class="nt">"class"</span><span class="p">:</span><span
class="s2">"virginica"</span><span class="p">}</span><span class="w">
+</span><span class="p">{</span><span class="nt">"sepallength"</span><span
class="p">:</span><span class="mf">4.7</span><span class="p">,</span><span
class="nt">"sepalwidth"</span><span class="p">:</span><span
class="mf">3.2</span><span class="p">,</span><span
class="nt">"petallength"</span><span class="p">:</span><span
class="mf">1.3</span><span class="p">,</span><span
class="nt">"petalwidth"</span><span class="p">:</span><span
class="mf">0.2</span><span class="p">,</span><span
class="nt">"class"</span><span class="p">:</span><span
class="s2">"virginica"</span><span class="p">}</span><span class="w">
+</span><span class="p">{</span><span class="nt">"sepallength"</span><span
class="p">:</span><span class="mf">3.1</span><span class="p">,</span><span
class="nt">"sepalwidth"</span><span class="p">:</span><span
class="mf">1.5</span><span class="p">,</span><span
class="nt">"petallength"</span><span class="p">:</span><span
class="mf">4.6</span><span class="p">,</span><span
class="nt">"petalwidth"</span><span class="p">:</span><span
class="mf">0.2</span><span class="p">,</span><span
class="nt">"class"</span><span class="p">:</span><span
class="s2">"setosa"</span><span class="p">}</span><span class="w">
+</span></code></pre></div>
+<h4 id="iris-dataset-binary-encoded-avro-format">Iris Dataset - Binary Encoded
AVRO Format</h4>
+<div class="highlight"><pre><code class="language-"
data-lang="">Objavro.schemaÎ
{"type":"record","name":"Iris","namespace":"org.apache.samoa.avro.iris","fields":[{"name":"sepallength","type":"double"},{"name":"sepalwidth","type":"double"},{"name":"petallength","type":"double"},{"name":"petalwidth","type":"double"},{"name":"class","type":{"type":"enum","name":"Labels","symbols":["setosa","versicolor","virginica"]}}]}
!<khCrÖ±Së¹§Þ©Èffffff@ @ffffffÙÙÉ¿
@ffffffÙÙ@ÚÙÙÉ¿ÎÍÍ@ÚÙÙ @ÎÍÍÙÙÉ¿ÎÍÍ@ 𿦦ffff@ÚÙÙÉ¿
!<khCrÖ±Së¹§Þ©
+</code></pre></div>
+<h4 id="forest-covertype-dataset">Forest CoverType Dataset</h4>
+
+<p>The JSON & Binary encoded AVRO Files covtypeNorm_json.avro &
covtypeNorm_binary.avro for the Forest CoverType dataset can be found at <a
href="https://cwiki.apache.org/confluence/display/SAMOA/SAMOA+Home">Wiki</a>
</p>
+
+ </article>
+
+<!-- </div> -->
+
+
+
+ <hr/>
+<div id="footer" class="container text-center">
+
+ <p class="text-muted credit"><p>
+Copyright © 2014 <a href="http://www.apache.org">Apache Software
Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache
feather logo are trademarks of The Apache Software Foundation. All other marks
mentioned may be trademarks or registered trademarks of their respective
owners.</p>
+
+</div>
+
+ <!-- Bootstrap core JavaScript
+ ================================================== -->
+ <!-- Placed at the end of the document so the pages load faster -->
+ <script
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
+ <script src="/assets/js/bootstrap.min.js"></script>
+ <script src="/assets/js/docs.min.js"></script>
+ <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
+ <script src="/assets/js/ie10-viewport-bug-workaround.js"></script>
+
+ </div>
+
+ </body>
+
+</html>
Modified: incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html
URL:
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html?rev=1762231&r1=1762230&r2=1762231&view=diff
==============================================================================
--- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html
(original)
+++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html Sun
Sep 25 20:39:59 2016
@@ -76,115 +76,101 @@
<p>In this tutorial page we describe how to execute SAMOA on top of Apache
S4.</p>
<h2 id="prerequisites">Prerequisites</h2>
+
<p>The following dependencies are needed to run SAMOA smoothly on Apache S4</p>
<ul>
- <li><a href="http://www.gradle.org/">Gradle</a></li>
- <li><a href="https://incubator.apache.org/s4/">Apache S4</a></li>
+<li><a href="http://www.gradle.org/">Gradle</a></li>
+<li><a href="https://incubator.apache.org/s4/">Apache S4</a></li>
</ul>
<h2 id="gradle">Gradle</h2>
+
<p>Gradle is a build automation tool and is used to build Apache S4. The
installation guide can be found <a
href="http://www.gradle.org/docs/current/userguide/installation.html">here.</a>
The following instructions is a simplified installation guide.</p>
<ol>
- <li>Download Gradle binaries from <a
href="http://services.gradle.org/distributions/gradle-1.6-bin.zip">downloads</a>,
or from the console type <code class="highlighter-rouge">wget
http://services.gradle.org/distributions/gradle-1.6-bin.zip</code></li>
- <li>Unzip the file <code class="highlighter-rouge">unzip
gradle-1.6-bin.zip</code></li>
- <li>Set the Gradle environment variable: <code
class="highlighter-rouge">export GRADLE_HOME=/foo/bar/gradle-1.6</code></li>
- <li>Add to the systems path <code class="highlighter-rouge">export
PATH=$PATH:$GRADLE_HOME/bin</code></li>
- <li>Install Gradle by running <code
class="highlighter-rouge">gradle</code></li>
+<li>Download Gradle binaries from <a
href="http://services.gradle.org/distributions/gradle-1.6-bin.zip">downloads</a>,
or from the console type <code>wget
http://services.gradle.org/distributions/gradle-1.6-bin.zip</code></li>
+<li>Unzip the file <code>unzip gradle-1.6-bin.zip</code></li>
+<li>Set the Gradle environment variable: <code>export
GRADLE_HOME=/foo/bar/gradle-1.6</code></li>
+<li>Add to the systems path <code>export
PATH=$PATH:$GRADLE_HOME/bin</code></li>
+<li>Install Gradle by running <code>gradle</code></li>
</ol>
<p>Now you are all set to install Apache S4</p>
<h2 id="apache-s4">Apache S4</h2>
+
<p>S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable
platform that allows programmers to easily develop applications for processing
continuous unbounded streams of data. The installation process is as
follows:</p>
<ol>
- <li>Download the latest Apache S4 release from <a
href="http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip">Apache
S4 0.6.0</a> or from command line <code class="highlighter-rouge">wget
http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip</code>
or clone from git.
-<code class="highlighter-rouge">git clone
https://git-wip-us.apache.org/repos/asf/incubator-s4.git</code>.</li>
- <li>Unzip the file <code class="highlighter-rouge">unzip
apache-s4-0.6.0-incubating-src.zip</code> or go in the cloned directory.</li>
- <li>Set the Apache S4 environment variable <code
class="highlighter-rouge">export
S4_HOME=/foo/bar/apache-s4-0.6.0-incubating-src</code>.</li>
- <li>Add the S4_HOME to the system PATH. <code
class="highlighter-rouge">export PATH=$PATH:$S4_HOME</code>.</li>
- <li>Once the previous steps are done we can proceed to build and install
Apache S4.</li>
- <li>You can have a look at the available build tasks by typing <code
class="highlighter-rouge">gradle tasks</code>.</li>
- <li>There are some dependencies issues, therefore you should run the wrapper
task first by typing <code class="highlighter-rouge">gradle wrapper</code>.</li>
- <li>Install the artifacts for Apache S4 by running <code
class="highlighter-rouge">gradle install</code> in the S4_HOME directory.</li>
- <li>Install the S4-TOOLS, <code class="highlighter-rouge">gradle
s4-tools::installApp</code>.</li>
+<li>Download the latest Apache S4 release from <a
href="http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip">Apache
S4 0.6.0</a> or from command line <code>wget
http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip</code>
or clone from git.
+<code>git clone
https://git-wip-us.apache.org/repos/asf/incubator-s4.git</code>.</li>
+<li>Unzip the file <code>unzip apache-s4-0.6.0-incubating-src.zip</code> or go
in the cloned directory.</li>
+<li>Set the Apache S4 environment variable <code>export
S4_HOME=/foo/bar/apache-s4-0.6.0-incubating-src</code>.</li>
+<li>Add the S4_HOME to the system PATH. <code>export
PATH=$PATH:$S4_HOME</code>.</li>
+<li>Once the previous steps are done we can proceed to build and install
Apache S4.</li>
+<li>You can have a look at the available build tasks by typing <code>gradle
tasks</code>.</li>
+<li>There are some dependencies issues, therefore you should run the wrapper
task first by typing <code>gradle wrapper</code>.</li>
+<li>Install the artifacts for Apache S4 by running <code>gradle install</code>
in the S4_HOME directory.</li>
+<li>Install the S4-TOOLS, <code>gradle s4-tools::installApp</code>.</li>
</ol>
<p>Done. Now you can configure and run your Apache S4 cluster.</p>
-<hr />
+<hr>
<h2 id="building-samoa">Building SAMOA</h2>
-<p>Once the S4 dependencies are installed, you can simply clone the repository
and install SAMOA.</p>
-<p><code class="highlighter-rouge">bash
-git clone http://git.apache.org/incubator-samoa.git
-cd incubator-samoa
+<p>Once the S4 dependencies are installed, you can simply clone the repository
and install SAMOA.</p>
+<div class="highlight"><pre><code class="language-bash" data-lang="bash">git
clone http://git.apache.org/incubator-samoa.git
+<span class="nb">cd </span>incubator-samoa
mvn -Ps4 package
-</code></p>
+</code></pre></div>
+<p>The deployable jars for SAMOA will be in
<code>target/SAMOA-<variant>-<version>-SNAPSHOT.jar</code>. For
example, in our case for S4 <code>target/SAMOA-S4-0.3.0-SNAPSHOT.jar</code>.</p>
-<p>The deployable jars for SAMOA will be in <code
class="highlighter-rouge">target/SAMOA-<variant>-<version>-SNAPSHOT.jar</code>.
For example, in our case for S4 <code
class="highlighter-rouge">target/SAMOA-S4-0.3.0-SNAPSHOT.jar</code>.</p>
-
-<hr />
+<hr>
<h2 id="samoa-s4-configuration">SAMOA-S4 Configuration</h2>
-<p>This section will go through the <code
class="highlighter-rouge">bin/samoa-s4.properties</code> file and how to
configure it.
-In order for SAMOA to run correctly in a distributed environment there are
some variables that need to be defined. Since Apache S4 uses <a
href="https://zookeeper.apache.org/">ZooKeeper</a> for cluster management we
need to define where it is running.</p>
-<div class="highlighter-rouge"><pre class="highlight"><code># Zookeeper Server
+<p>This section will go through the <code>bin/samoa-s4.properties</code> file
and how to configure it.
+In order for SAMOA to run correctly in a distributed environment there are
some variables that need to be defined. Since Apache S4 uses <a
href="https://zookeeper.apache.org/">ZooKeeper</a> for cluster management we
need to define where it is running.</p>
+<div class="highlight"><pre><code class="language-" data-lang=""># Zookeeper
Server
zookeeper.server=localhost
zookeeper.port=2181
-</code></pre>
-</div>
-
+</code></pre></div>
<p>Apache S4 also distributes the application via HTTP, therefore the server
and port which contains the S4 application must be provided.</p>
-
-<div class="highlighter-rouge"><pre class="highlight"><code># Simple HTTP
Server providing the packaged S4 jar
+<div class="highlight"><pre><code class="language-" data-lang=""># Simple HTTP
Server providing the packaged S4 jar
http.server.ip=localhost
http.server.port=8000
-</code></pre>
-</div>
-
+</code></pre></div>
<p>Apache S4 uses the concept of logical clusters to define a group of
machines, which are identified by an ID and start serving on a specific
port.</p>
-
-<div class="highlighter-rouge"><pre class="highlight"><code># Name of the S4
cluster
+<div class="highlight"><pre><code class="language-" data-lang=""># Name of the
S4 cluster
cluster.name=cluster
cluster.port=12000
-</code></pre>
-</div>
-
-<p>SAMOA can be deployed on a single machine using only one resource or in a
cluster environments. The following property can be defined to deploy as a
<code class="highlighter-rouge">local</code> application or on a <code
class="highlighter-rouge">cluster</code>.</p>
-
-<div class="highlighter-rouge"><pre class="highlight"><code># Deployment
strategy
+</code></pre></div>
+<p>SAMOA can be deployed on a single machine using only one resource or in a
cluster environments. The following property can be defined to deploy as a
<code>local</code> application or on a <code>cluster</code>.</p>
+<div class="highlight"><pre><code class="language-" data-lang=""># Deployment
strategy
samoa.deploy.mode=local
-</code></pre>
-</div>
-
-<hr />
+</code></pre></div>
+<hr>
<h2 id="samoa-s4-deployment">SAMOA S4 Deployment</h2>
-<p>In order to deploy SAMOA in a distributed environment you
<strong>MUST</strong> configure the <code
class="highlighter-rouge">bin/samoa-s4.properties</code> file correctly. If you
are running locally it is optional to modify the properties file.</p>
+<p>In order to deploy SAMOA in a distributed environment you
<strong>MUST</strong> configure the <code>bin/samoa-s4.properties</code> file
correctly. If you are running locally it is optional to modify the properties
file.</p>
-<p>The deployment is done by running the SAMOA execution script <code
class="highlighter-rouge">bin/samoa</code> with some additional parameters.
+<p>The deployment is done by running the SAMOA execution script
<code>bin/samoa</code> with some additional parameters.
The execution syntax is as follows:
-<code class="highlighter-rouge">bin/samoa <platform>
<jar-location> <task & options></code></p>
+<code>bin/samoa <platform> <jar-location> <task &
options></code></p>
<p>Example:</p>
-
-<div class="highlighter-rouge"><pre class="highlight"><code>bin/samoa S4
target/SAMOA-S4-0.0.1-SNAPSHOT.jar "ClusteringEvaluation"
-</code></pre>
-</div>
-
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa S4
target/SAMOA-S4-0.0.1-SNAPSHOT.jar "ClusteringEvaluation"
+</code></pre></div>
<p>The <platform> can be s4 or storm.</p>
<p>The <jar-location> must be the absolute path to the platform specific
jar file.</p>
<p>The <task & options> should be the name of a known task and the
options belonging to that task.</p>
-
</article>
<!-- </div> -->