Modified: incubator/samoa/site/documentation/Processor.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Processor.html?rev=1737551&r1=1737550&r2=1737551&view=diff ============================================================================== --- incubator/samoa/site/documentation/Processor.html (original) +++ incubator/samoa/site/documentation/Processor.html Sun Apr 3 08:17:59 2016 @@ -74,79 +74,71 @@ <article class="post-content"> <p>Processor is the basic logical processing unit. All logic is written in the processor. In SAMOA, a Processor is an interface. Users can implement this interface to build their own processors. -<img src="images/Topology.png" alt="Topology"></p> - -<h3 id="adding-a-processor-to-the-topology">Adding a Processor to the topology</h3> +<img src="images/Topology.png" alt="Topology" /> +### Adding a Processor to the topology</p> <p>There are two ways to add a processor to the topology.</p> -<h4 id="1-processor">1. Processor</h4> - -<p>All physical topology units are created with the help of a <code>TopologyBuilder</code>. Following code snippet shows how to add a Processor to the topology. -<code> +<h4 id="processor">1. Processor</h4> +<p>All physical topology units are created with the help of a <code class="highlighter-rouge">TopologyBuilder</code>. Following code snippet shows how to add a Processor to the topology. +<code class="highlighter-rouge"> Processor processor = new ExampleProcessor(); builder.addProcessor(processor, paralellism); </code> -<code>addProcessor()</code> method of <code>TopologyBuilder</code> is used to add the processor. Its first argument is the instance of a Processor which needs to be added. Its second argument is the parallelism hint. It tells the underlying platforms how many parallel instances of this processor should be created on different nodes.</p> - -<h4 id="2-entrance-processor">2. Entrance Processor</h4> +<code class="highlighter-rouge">addProcessor()</code> method of <code class="highlighter-rouge">TopologyBuilder</code> is used to add the processor. Its first argument is the instance of a Processor which needs to be added. Its second argument is the parallelism hint. It tells the underlying platforms how many parallel instances of this processor should be created on different nodes.</p> +<h4 id="entrance-processor">2. Entrance Processor</h4> <p>Some processors generates their own streams, and they are used as the source of a topology. They connect to external sources, pull data and provide it to the topology in the form of streams. -All physical topology units are created with the help of a <code>TopologyBuilder</code>. The following code snippet shows how to add an entrance processor to the topology and create a stream from it. -<code> +All physical topology units are created with the help of a <code class="highlighter-rouge">TopologyBuilder</code>. The following code snippet shows how to add an entrance processor to the topology and create a stream from it. +<code class="highlighter-rouge"> EntranceProcessor entranceProcessor = new EntranceProcessor(); builder.addEntranceProcessor(entranceProcessor); Stream source = builder.createStream(entranceProcessor); </code></p> <h3 id="preview-of-processor">Preview of Processor</h3> -<div class="highlight"><pre><code class="language-" data-lang="">package samoa.core; +<p><code class="highlighter-rouge"> +package samoa.core; public interface Processor extends java.io.Serializable{ - boolean process(ContentEvent event); - void onCreate(int id); - Processor newProcessor(Processor p); + boolean process(ContentEvent event); + void onCreate(int id); + Processor newProcessor(Processor p); } -</code></pre></div> -<h3 id="methods">Methods</h3> - -<h4 id="1-boolean-process-contentevent-event">1. <code>boolean process(ContentEvent event)</code></h4> - -<p>Users should implement the three methods shown above. <code>process(ContentEvent event)</code> is the method in which all processing logic should be implemented. <code>ContentEvent</code> is a type (interface) which contains the event. This method will be called each time a new event is received. It should return <code>true</code> if the event has been correctly processed, <code>false</code> otherwise.</p> - -<h4 id="2-void-oncreate-int-id">2. <code>void onCreate(int id)</code></h4> +</code> +### Methods</p> -<p>is the method in which all initialization code should be written. Multiple copies/instances of the Processor are created based on the parallelism hint specified by the user. SAMOA assigns each instance a unique id which is passed as a parameter <code>id</code> to <code>onCreate(int it)</code> method of each instance.</p> +<h4 id="boolean-processcontentevent-event">1. <code class="highlighter-rouge">boolean process(ContentEvent event)</code></h4> +<p>Users should implement the three methods shown above. <code class="highlighter-rouge">process(ContentEvent event)</code> is the method in which all processing logic should be implemented. <code class="highlighter-rouge">ContentEvent</code> is a type (interface) which contains the event. This method will be called each time a new event is received. It should return <code class="highlighter-rouge">true</code> if the event has been correctly processed, <code class="highlighter-rouge">false</code> otherwise.</p> -<h4 id="3-processor-newprocessor-processor-p">3. <code>Processor newProcessor(Processor p)</code></h4> +<h4 id="void-oncreateint-id">2. <code class="highlighter-rouge">void onCreate(int id)</code></h4> +<p>is the method in which all initialization code should be written. Multiple copies/instances of the Processor are created based on the parallelism hint specified by the user. SAMOA assigns each instance a unique id which is passed as a parameter <code class="highlighter-rouge">id</code> to <code class="highlighter-rouge">onCreate(int it)</code> method of each instance.</p> -<p>is very simple to implement. This method is just a technical overhead that has no logical use except that it helps SAMOA in some of its internals. Users should just return a new copy of the instance of this class which implements this Processor interface. </p> +<h4 id="processor-newprocessorprocessor-p">3. <code class="highlighter-rouge">Processor newProcessor(Processor p)</code></h4> +<p>is very simple to implement. This method is just a technical overhead that has no logical use except that it helps SAMOA in some of its internals. Users should just return a new copy of the instance of this class which implements this Processor interface.</p> <h3 id="preview-of-entranceprocessor">Preview of EntranceProcessor</h3> -<div class="highlight"><pre><code class="language-" data-lang="">package org.apache.samoa.core; +<p>``` +package org.apache.samoa.core;</p> -public interface EntranceProcessor extends Processor { +<p>public interface EntranceProcessor extends Processor { public boolean isFinished(); public boolean hasNext(); public ContentEvent nextEvent(); } -</code></pre></div> -<h3 id="methods">Methods</h3> - -<h4 id="1-boolean-isfinished">1. <code>boolean isFinished()</code></h4> - -<p>returns whether to expect more events coming from the entrance processor. If the source is a live stream this method should return always <code>false</code>. If the source is a file, the method should return <code>true</code> once the file has been fully processed.</p> +``` +### Methods</p> -<h4 id="2-boolean-hasnext">2. <code>boolean hasNext()</code></h4> +<h4 id="boolean-isfinished">1. <code class="highlighter-rouge">boolean isFinished()</code></h4> +<p>returns whether to expect more events coming from the entrance processor. If the source is a live stream this method should return always <code class="highlighter-rouge">false</code>. If the source is a file, the method should return <code class="highlighter-rouge">true</code> once the file has been fully processed.</p> -<p>returns whether the next event is ready for consumption. If the method returns <code>true</code> a subsequent call to <code>nextEvent</code> should yield the next event to be processed. If the method returns <code>false</code> the engine can use this information to avoid continuously polling the entrance processor.</p> +<h4 id="boolean-hasnext">2. <code class="highlighter-rouge">boolean hasNext()</code></h4> +<p>returns whether the next event is ready for consumption. If the method returns <code class="highlighter-rouge">true</code> a subsequent call to <code class="highlighter-rouge">nextEvent</code> should yield the next event to be processed. If the method returns <code class="highlighter-rouge">false</code> the engine can use this information to avoid continuously polling the entrance processor.</p> -<h4 id="3-contentevent-nextevent">3. <code>ContentEvent nextEvent()</code></h4> - -<p>is the main method for the entrance processor as it returns the next event to be processed by the topology. It should be called only if <code>isFinished()</code> returned <code>false</code> and <code>hasNext()</code> returned <code>true</code>.</p> +<h4 id="contentevent-nextevent">3. <code class="highlighter-rouge">ContentEvent nextEvent()</code></h4> +<p>is the main method for the entrance processor as it returns the next event to be processed by the topology. It should be called only if <code class="highlighter-rouge">isFinished()</code> returned <code class="highlighter-rouge">false</code> and <code class="highlighter-rouge">hasNext()</code> returned <code class="highlighter-rouge">true</code>.</p> <h3 id="note">Note</h3> - -<p>All state variables of the class implementing this interface must be serializable. It can be done by implementing the <code>Serializable</code> interface. The simple way to skip this requirement is to declare those variables as <code>transient</code> and initialize them in the <code>onCreate()</code> method. Remember, all initializations of such transient variables done in the constructor will be lost.</p> +<p>All state variables of the class implementing this interface must be serializable. It can be done by implementing the <code class="highlighter-rouge">Serializable</code> interface. The simple way to skip this requirement is to declare those variables as <code class="highlighter-rouge">transient</code> and initialize them in the <code class="highlighter-rouge">onCreate()</code> method. Remember, all initializations of such transient variables done in the constructor will be lost.</p> </article>
Modified: incubator/samoa/site/documentation/SAMOA-Topology.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/SAMOA-Topology.html?rev=1737551&r1=1737550&r2=1737551&view=diff ============================================================================== --- incubator/samoa/site/documentation/SAMOA-Topology.html (original) +++ incubator/samoa/site/documentation/SAMOA-Topology.html Sun Apr 3 08:17:59 2016 @@ -76,18 +76,18 @@ <p>Apache SAMOA allows users to write their stream processing algorithms in an easy and platform independent way. SAMOA defines its own topology which is very intuitive and simple to use. Currently SAMOA has the following basic topology elements.</p> <ol> -<li><a href="Processor.html">Processor</a></li> -<li><a href="Content-Event.html">Content Event</a></li> -<li><a href="Stream.html">Stream</a></li> -<li><a href="Task.html">Task</a></li> -<li><a href="Topology-Builder.html">Topology Builder</a></li> -<li><a href="Learner.html">Learner</a></li> -<li><strong>Advanced topic</strong>: <a href="Processing-Item.html">Processing Item</a></li> + <li><a href="Processor.html">Processor</a></li> + <li><a href="Content-Event.html">Content Event</a></li> + <li><a href="Stream.html">Stream</a></li> + <li><a href="Task.html">Task</a></li> + <li><a href="Topology-Builder.html">Topology Builder</a></li> + <li><a href="Learner.html">Learner</a></li> + <li><strong>Advanced topic</strong>: <a href="Processing-Item.html">Processing Item</a></li> </ol> <p>Processor and Content Event are the logical units to build your algorithm, Stream and Task are the physical units to wire the various pieces of your algorithm, whereas Topology Builder is an administrative unit that provides bookkeeping services. Learner is the base interface for learning algorithms. Processing Items are internal wrappers for Processors used inside SAMOA.</p> -<p><img src="images/Topology.png" alt="Topology"></p> +<p><img src="images/Topology.png" alt="Topology" /></p> </article> Modified: incubator/samoa/site/documentation/SAMOA-and-Machine-Learning.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/SAMOA-and-Machine-Learning.html?rev=1737551&r1=1737550&r2=1737551&view=diff ============================================================================== --- incubator/samoa/site/documentation/SAMOA-and-Machine-Learning.html (original) +++ incubator/samoa/site/documentation/SAMOA-and-Machine-Learning.html Sun Apr 3 08:17:59 2016 @@ -73,15 +73,15 @@ </header> <article class="post-content"> - <p>SAMOA's main goal is to help developers to create easily machine learning algorithms on top of any distributed stream processing engine. Here we present the available machine learning algorithms implemented in SAMOA and how to use them. </p> + <p>SAMOAâs main goal is to help developers to create easily machine learning algorithms on top of any distributed stream processing engine. Here we present the available machine learning algorithms implemented in SAMOA and how to use them.</p> <ul> -<li><a href="Prequential-Evaluation-Task.html">2.1 Prequential Evaluation Task</a></li> -<li><a href="Vertical-Hoeffding-Tree-Classifier.html">2.2 Vertical Hoeffding Tree Classifier</a></li> -<li><a href="Adaptive-Model-Rules-Regressor.html">2.3 Adaptive Model Rules Regressor</a></li> -<li><a href="Bagging-and-Boosting.html">2.4 Bagging and Boosting</a></li> -<li><a href="Distributed-Stream-Clustering.html">2.5 Distributed Stream Clustering</a></li> -<li><a href="Distributed-Stream-Frequent-Itemset-Mining.html">2.6 Distributed Stream Frequent Itemset Mining</a></li> + <li><a href="Prequential-Evaluation-Task.html">2.1 Prequential Evaluation Task</a></li> + <li><a href="Vertical-Hoeffding-Tree-Classifier.html">2.2 Vertical Hoeffding Tree Classifier</a></li> + <li><a href="Adaptive-Model-Rules-Regressor.html">2.3 Adaptive Model Rules Regressor</a></li> + <li><a href="Bagging-and-Boosting.html">2.4 Bagging and Boosting</a></li> + <li><a href="Distributed-Stream-Clustering.html">2.5 Distributed Stream Clustering</a></li> + <li><a href="Distributed-Stream-Frequent-Itemset-Mining.html">2.6 Distributed Stream Frequent Itemset Mining</a></li> </ul> </article> Modified: incubator/samoa/site/documentation/SAMOA-for-MOA-users.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/SAMOA-for-MOA-users.html?rev=1737551&r1=1737550&r2=1737551&view=diff ============================================================================== --- incubator/samoa/site/documentation/SAMOA-for-MOA-users.html (original) +++ incubator/samoa/site/documentation/SAMOA-for-MOA-users.html Sun Apr 3 08:17:59 2016 @@ -73,23 +73,23 @@ </header> <article class="post-content"> - <p>If you're an advanced user of <a href="http://moa.cms.waikato.ac.nz/">MOA</a>, you'll find easy to run SAMOA. You need to note the following:</p> + <p>If youâre an advanced user of <a href="http://moa.cms.waikato.ac.nz/">MOA</a>, youâll find easy to run SAMOA. You need to note the following:</p> <ul> -<li>There is no GUI interface in SAMOA</li> -<li>You can run SAMOA in the following modes: - -<ol> -<li>Simulation Environment. Use <code>org.apache.samoa.DoTask</code> instead of <code>moa.DoTask</code><br></li> -<li>Storm Local Mode. Use <code>org.apache.samoa.LocalStormDoTask</code> instead of <code>moa.DoTask</code></li> -<li>Storm Cluster Mode. You need to use the <code>samoa</code> script as it is explained in <a href="Executing%20SAMOA%20with%20Apache%20Storm">Executing SAMOA with Apache Storm</a>.</li> -<li>S4. You need to use the <code>samoa</code> script as it is explained in <a href="Executing%20SAMOA%20with%20Apache%20S4">Executing SAMOA with Apache S4</a></li> -</ol></li> + <li>There is no GUI interface in SAMOA</li> + <li>You can run SAMOA in the following modes: + <ol> + <li>Simulation Environment. Use <code class="highlighter-rouge">org.apache.samoa.DoTask</code> instead of <code class="highlighter-rouge">moa.DoTask</code></li> + <li>Storm Local Mode. Use <code class="highlighter-rouge">org.apache.samoa.LocalStormDoTask</code> instead of <code class="highlighter-rouge">moa.DoTask</code></li> + <li>Storm Cluster Mode. You need to use the <code class="highlighter-rouge">samoa</code> script as it is explained in <a href="Executing SAMOA with Apache Storm">Executing SAMOA with Apache Storm</a>.</li> + <li>S4. You need to use the <code class="highlighter-rouge">samoa</code> script as it is explained in <a href="Executing SAMOA with Apache S4">Executing SAMOA with Apache S4</a></li> + </ol> + </li> </ul> -<p>To start with SAMOA, you can start with a simple example using the CoverType dataset as it is discussed in <a href="Getting%20Started">Getting Started</a>. </p> +<p>To start with SAMOA, you can start with a simple example using the CoverType dataset as it is discussed in <a href="Getting Started">Getting Started</a>.</p> -<p>To use MOA algorithms inside SAMOA, take a look at <a href="https://github.com/samoa-moa/samoa-moa">https://github.com/samoa-moa/samoa-moa</a>. </p> +<p>To use MOA algorithms inside SAMOA, take a look at <a href="https://github.com/samoa-moa/samoa-moa">https://github.com/samoa-moa/samoa-moa</a>.</p> </article> Modified: incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html?rev=1737551&r1=1737550&r2=1737551&view=diff ============================================================================== --- incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html (original) +++ incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html Sun Apr 3 08:17:59 2016 @@ -75,14 +75,14 @@ <article class="post-content"> <p>Scalable Advanced Massive Online Analysis (SAMOA) contains various algorithms for machine learning and data mining on data streams, and allows to run them on different distributed stream processing engines (DSPEs) such as Storm and S4. Currently, SAMOA contains methods for classification via Vertical Hoeffding Trees, bagging and boosting and clustering via CluStream.</p> -<p>In this pages, we explain how to build and execute SAMOA for the different distributed stream processing engines (DSPEs): </p> +<p>In this pages, we explain how to build and execute SAMOA for the different distributed stream processing engines (DSPEs):</p> <ul> -<li><a href="Building-SAMOA.html">Building SAMOA</a></li> -<li><a href="Executing-SAMOA-with-Apache-Storm.html">Executing SAMOA with Apache Storm</a></li> -<li><a href="Executing-SAMOA-with-Apache-S4.html">Executing SAMOA with Apache S4</a></li> -<li><a href="Executing-SAMOA-with-Apache-Samza.html">Executing SAMOA with Apache Samza</a></li> -<li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">Executing SAMOA with Apache Avro Files</a></li> + <li><a href="Building-SAMOA.html">Building SAMOA</a></li> + <li><a href="Executing-SAMOA-with-Apache-Storm.html">Executing SAMOA with Apache Storm</a></li> + <li><a href="Executing-SAMOA-with-Apache-S4.html">Executing SAMOA with Apache S4</a></li> + <li><a href="Executing-SAMOA-with-Apache-Samza.html">Executing SAMOA with Apache Samza</a></li> + <li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">Executing SAMOA with Apache Avro Files</a></li> </ul> </article> Modified: incubator/samoa/site/documentation/Stream.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Stream.html?rev=1737551&r1=1737550&r2=1737551&view=diff ============================================================================== --- incubator/samoa/site/documentation/Stream.html (original) +++ incubator/samoa/site/documentation/Stream.html Sun Apr 3 08:17:59 2016 @@ -73,47 +73,51 @@ </header> <article class="post-content"> - <p>A stream is a physical unit of SAMOA topology which connects different Processors with each other. Stream is also created by a <code>TopologyBuilder</code> just like a Processor. A stream can have a single source but many destinations. A Processor which is the source of a stream, owns the stream.</p> - -<h3 id="1-creating-a-stream">1. Creating a Stream</h3> + <p>A stream is a physical unit of SAMOA topology which connects different Processors with each other. Stream is also created by a <code class="highlighter-rouge">TopologyBuilder</code> just like a Processor. A stream can have a single source but many destinations. A Processor which is the source of a stream, owns the stream.</p> +<h3 id="creating-a-stream">1. Creating a Stream</h3> <p>The following code snippet shows how a Stream is created:</p> -<div class="highlight"><pre><code class="language-" data-lang="">builder.initTopology("MyTopology"); + +<p><code class="highlighter-rouge"> +builder.initTopology("MyTopology"); Processor sourceProcessor = new Sampler(); builder.addProcessor(samplerProcessor, 3); Stream sourceDataStream = builder.createStream(sourceProcessor); -</code></pre></div> -<h3 id="2-connecting-a-stream">2. Connecting a Stream</h3> +</code></p> +<h3 id="connecting-a-stream">2. Connecting a Stream</h3> <p>As described above, a Stream can have many destinations. In the following figure, a single stream from sourceProcessor is connected to three different destination Processors each having three instances.</p> -<p><img src="images/SAMOA%20Message%20Shuffling.png" alt="SAMOA Message Shuffling"></p> - -<p>SAMOA supports three different ways of distribution of messages to multiple instances of a Processor.</p> +<p><img src="images/SAMOA Message Shuffling.png" alt="SAMOA Message Shuffling" /></p> -<h4 id="2-1-shuffle">2.1 Shuffle</h4> - -<p>In this way of message distribution, messages/events are distributed randomly among various instances of a Processor. +<p>SAMOA supports three different ways of distribution of messages to multiple instances of a Processor. +####2.1 Shuffle +In this way of message distribution, messages/events are distributed randomly among various instances of a Processor. Following figure shows how the messages are distributed. -<img src="images/SAMOA%20Explain%20Shuffling.png" alt="SAMOA Explain Shuffling"> +<img src="images/SAMOA Explain Shuffling.png" alt="SAMOA Explain Shuffling" /> Following code snipped shows how to connect a stream to a destination using random shuffling.</p> -<div class="highlight"><pre><code class="language-" data-lang="">builder.connectInputShuffleStream(sourceDataStream, destinationProcessor); -</code></pre></div> -<h4 id="2-2-key">2.2 Key</h4> -<p>In this way of message distribution, messages with same key are sent to same instance of a Processor. +<p><code class="highlighter-rouge"> +builder.connectInputShuffleStream(sourceDataStream, destinationProcessor); +</code> +####2.2 Key +In this way of message distribution, messages with same key are sent to same instance of a Processor. Following figure illustrates key-based distribution. -<img src="images/SAMOA%20Explain%20Key%20Shuffling.png" alt="SAMOA Explain Key Shuffling"> +<img src="images/SAMOA Explain Key Shuffling.png" alt="SAMOA Explain Key Shuffling" /> Following code snippet shows how to connect a stream to a destination using key-based distribution.</p> -<div class="highlight"><pre><code class="language-" data-lang="">builder.connectInputKeyStream(sourceDataStream, destinationProcessor); -</code></pre></div> -<h4 id="2-3-all">2.3 All</h4> -<p>In this way of message distribution, all messages of a stream are sent to all instances of a destination Processor. Following figure illustrates this distribution process. -<img src="images/SAMOA%20Explain%20All%20Shuffling.png" alt="SAMOA Explain All Shuffling"> +<p><code class="highlighter-rouge"> +builder.connectInputKeyStream(sourceDataStream, destinationProcessor); +</code> +####2.3 All +In this way of message distribution, all messages of a stream are sent to all instances of a destination Processor. Following figure illustrates this distribution process. +<img src="images/SAMOA Explain All Shuffling.png" alt="SAMOA Explain All Shuffling" /> Following code snippet shows how to connect a stream to a destination using All-based distribution.</p> -<div class="highlight"><pre><code class="language-" data-lang="">builder.connectInputAllStream(sourceDataStream, destinationProcessor); -</code></pre></div> + +<p><code class="highlighter-rouge"> +builder.connectInputAllStream(sourceDataStream, destinationProcessor); +</code></p> + </article> <!-- </div> --> Modified: incubator/samoa/site/documentation/Task.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Task.html?rev=1737551&r1=1737550&r2=1737551&view=diff ============================================================================== --- incubator/samoa/site/documentation/Task.html (original) +++ incubator/samoa/site/documentation/Task.html Sun Apr 3 08:17:59 2016 @@ -73,56 +73,55 @@ </header> <article class="post-content"> - <p>Task is similar to a job in Hadoop. Task is an execution entity. A topology must be defined inside a task. SAMOA can only execute classes that implement <code>Task</code> interface.</p> + <p>Task is similar to a job in Hadoop. Task is an execution entity. A topology must be defined inside a task. SAMOA can only execute classes that implement <code class="highlighter-rouge">Task</code> interface.</p> -<h3 id="1-implementation">1. Implementation</h3> -<div class="highlight"><pre><code class="language-" data-lang="">package org.apache.samoa.tasks; +<h3 id="implementation">1. Implementation</h3> +<p>``` +package org.apache.samoa.tasks;</p> -import org.apache.samoa.topology.ComponentFactory; -import org.apache.samoa.topology.Topology; +<p>import org.apache.samoa.topology.ComponentFactory; +import org.apache.samoa.topology.Topology;</p> -/** +<p>/** * Task interface, the mother of all SAMOA tasks! */ -public interface Task { +public interface Task {</p> - /** - * Initialize this SAMOA task, - * i.e. create and connect Processors and Streams - * and initialize the topology - */ - public void init(); - - /** - * Return the final topology object to be executed in the cluster - * @return topology object to be submitted to be executed in the cluster - */ - public Topology getTopology(); - - /** - * Sets the factory. - * TODO: propose to hide factory from task, - * i.e. Task will only see TopologyBuilder, - * and factory creation will be handled by TopologyBuilder - * - * @param factory the new factory - */ - public void setFactory(ComponentFactory factory) ; -} -</code></pre></div> -<h3 id="2-methods">2. Methods</h3> - -<h5 id="2-1-void-init">2.1 <code>void init()</code></h5> - -<p>This method should build the desired topology by creating Processors and Streams and connecting them to each other.</p> +<div class="highlighter-rouge"><pre class="highlight"><code>/** + * Initialize this SAMOA task, + * i.e. create and connect Processors and Streams + * and initialize the topology + */ +public void init(); -<h5 id="2-2-topology-gettopology">2.2 <code>Topology getTopology()</code></h5> +/** + * Return the final topology object to be executed in the cluster + * @return topology object to be submitted to be executed in the cluster + */ +public Topology getTopology(); + +/** + * Sets the factory. + * TODO: propose to hide factory from task, + * i.e. Task will only see TopologyBuilder, + * and factory creation will be handled by TopologyBuilder + * + * @param factory the new factory + */ +public void setFactory(ComponentFactory factory) ; } ``` +</code></pre> +</div> + +<h3 id="methods">2. Methods</h3> +<p>#####2.1 <code class="highlighter-rouge">void init()</code> +This method should build the desired topology by creating Processors and Streams and connecting them to each other.</p> -<p>This method should return the topology built by <code>init</code> to the engine for execution.</p> +<h5 id="topology-gettopology">2.2 <code class="highlighter-rouge">Topology getTopology()</code></h5> +<p>This method should return the topology built by <code class="highlighter-rouge">init</code> to the engine for execution.</p> -<h5 id="2-3-void-setfactory-componentfactory-factory">2.3 <code>void setFactory(ComponentFactory factory)</code></h5> +<h5 id="void-setfactorycomponentfactory-factory">2.3 <code class="highlighter-rouge">void setFactory(ComponentFactory factory)</code></h5> +<p>Utility method to accept a <code class="highlighter-rouge">ComponentFactory</code> to use in building the topology.</p> -<p>Utility method to accept a <code>ComponentFactory</code> to use in building the topology.</p> </article> Modified: incubator/samoa/site/documentation/Team.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Team.html?rev=1737551&r1=1737550&r2=1737551&view=diff ============================================================================== --- incubator/samoa/site/documentation/Team.html (original) +++ incubator/samoa/site/documentation/Team.html Sun Apr 3 08:17:59 2016 @@ -76,52 +76,51 @@ <h2 id="team">Team</h2> <table class="table table-striped"> - <thead> - <th class="text-center"></th> - <th class="text-center">Name</th> - <th class="text-center">Role</th> - <th class="text-center">Apache ID</th> - </thead> - <tr> - <td class="text-center"></td> - <td class="text-center"><a href="http://gdfm.me/">Gianmarco De Francisci Morales</a></td> - <td class="text-center">PPMC</td> - <td class="text-center">gdfm</td> - </tr> - <tr> - <td class="text-center"></td> - <td class="text-center"><a href="http://www.albertbifet.com">Albert Bifet</a></td> - <td class="text-center">PPMC</td> - <td class="text-center">abifet</td> - </tr> - <tr> - <td class="text-center"></td> - <td class="text-center">Nicolas Kourtellis</td> - <td class="text-center">PPMC</td> - <td class="text-center">nkourtellis</td> - </tr> - <tr> - <td class="text-center"></td> - <td class="text-center"><a href="http://www.otnira.com">Arinto Murdopo</a></td> - <td class="text-center">PPMC</td> - <td class="text-center">arinto</td> - </tr> - <tr> - <td class="text-center"></td> - <td class="text-center">Matthieu Morel</td> - <td class="text-center">PPMC</td> - <td class="text-center">mmorel</td> - </tr> - <tr> - <td class="text-center"></td> - <td class="text-center"><a href="http://www.van-laere.net">Olivier Van Laere</a></td> - <td class="text-center">PPMC</td> - <td class="text-center">ovlaere</td> - </tr> + <thead> + <th class="text-center"></th> + <th class="text-center">Name</th> + <th class="text-center">Role</th> + <th class="text-center">Apache ID</th> + </thead> + <tr> + <td class="text-center"></td> + <td class="text-center"><a href="http://gdfm.me/">Gianmarco De Francisci Morales</a></td> + <td class="text-center">PPMC</td> + <td class="text-center">gdfm</td> + </tr> + <tr> + <td class="text-center"></td> + <td class="text-center"><a href="http://www.albertbifet.com">Albert Bifet</a></td> + <td class="text-center">PPMC</td> + <td class="text-center">abifet</td> + </tr> + <tr> + <td class="text-center"></td> + <td class="text-center">Nicolas Kourtellis</td> + <td class="text-center">PPMC</td> + <td class="text-center">nkourtellis</td> + </tr> + <tr> + <td class="text-center"></td> + <td class="text-center"><a href="http://www.otnira.com">Arinto Murdopo</a></td> + <td class="text-center">PPMC</td> + <td class="text-center">arinto</td> + </tr> + <tr> + <td class="text-center"></td> + <td class="text-center">Matthieu Morel</td> + <td class="text-center">PPMC</td> + <td class="text-center">mmorel</td> + </tr> + <tr> + <td class="text-center"></td> + <td class="text-center"><a href="http://www.van-laere.net">Olivier Van Laere</a></td> + <td class="text-center">PPMC</td> + <td class="text-center">ovlaere</td> + </tr> </table> <h3 id="contributors">Contributors</h3> - <ul> <li><a href="http://www.lsi.upc.edu/~marias/">Marta Arias</a></li> <li>Foteini Beligianni</li> Modified: incubator/samoa/site/documentation/Topology-Builder.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Topology-Builder.html?rev=1737551&r1=1737550&r2=1737551&view=diff ============================================================================== --- incubator/samoa/site/documentation/Topology-Builder.html (original) +++ incubator/samoa/site/documentation/Topology-Builder.html Sun Apr 3 08:17:59 2016 @@ -73,32 +73,35 @@ </header> <article class="post-content"> - <p><code>TopologyBuilder</code> is a builder class which builds physical units of the topology and assemble them together. Each topology has a name. Following code snippet shows all the steps of creating a topology with one <code>EntrancePI</code>, two PIs and a few streams.</p> -<div class="highlight"><pre><code class="language-" data-lang="">TopologyBuilder builder = new TopologyBuilder(factory) // ComponentFactory factory -builder.initTopology("Parma Topology"); //initiates an empty topology with a name -//********************************Topology building*********************************** + <p><code class="highlighter-rouge">TopologyBuilder</code> is a builder class which builds physical units of the topology and assemble them together. Each topology has a name. Following code snippet shows all the steps of creating a topology with one <code class="highlighter-rouge">EntrancePI</code>, two PIs and a few streams.</p> + +<p>``` +TopologyBuilder builder = new TopologyBuilder(factory) // ComponentFactory factory +builder.initTopology(âParma Topologyâ); //initiates an empty topology with a name +//<strong>**</strong><strong>**</strong><strong>**</strong><strong>**</strong><strong>**</strong><strong>Topology building</strong><strong>**</strong><strong>**</strong><strong>**</strong><strong>**</strong><strong>**</strong>*** StreamSource sourceProcessor = new StreamSource(inputPath,d,sampleSize,fpmGap,epsilon,phi,numSamples); builder.addEntranceProcessor(sourceProcessor); Stream sourceDataStream = builder.createStream(sourceProcessor); sourceProcessor.setDataStream(sourceDataStream); Stream sourceControlStream = builder.createStream(sourceProcessor); -sourceProcessor.setControlStream(sourceControlStream); +sourceProcessor.setControlStream(sourceControlStream);</p> -Sampler sampler = new Sampler(minFreqPercent,sampleSize,(float)epsilon,outputPath,sampler); +<p>Sampler sampler = new Sampler(minFreqPercent,sampleSize,(float)epsilon,outputPath,sampler); builder.addProcessor(sampler, numSamples); builder.connectInputAllStream(sourceControlStream, sampler); -builder.connectInputShuffleStream(sourceDataStream, sampler); +builder.connectInputShuffleStream(sourceDataStream, sampler);</p> -Stream samplerDataStream = builder.createStream(sampler); +<p>Stream samplerDataStream = builder.createStream(sampler); samplerP.setSamplerDataStream(samplerDataStream); Stream samplerControlStream = builder.createStream(sampler); -samplerP.setSamplerControlStream(samplerControlStream); +samplerP.setSamplerControlStream(samplerControlStream);</p> -Aggregator aggregatorProcessor = new Aggregator(outputPath,(long)numSamples,(long)sampleSize,(long)reqApproxNum,(float)epsilon); +<p>Aggregator aggregatorProcessor = new Aggregator(outputPath,(long)numSamples,(long)sampleSize,(long)reqApproxNum,(float)epsilon); builder.addProcessor(aggregatorProcessor, numAggregators); builder.connectInputKeyStream(samplerDataStream, aggregatorProcessor); builder.connectInputAllStream(samplerControlStream, aggregatorProcessor); -</code></pre></div> +```</p> + </article> <!-- </div> --> Modified: incubator/samoa/site/documentation/Vertical-Hoeffding-Tree-Classifier.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Vertical-Hoeffding-Tree-Classifier.html?rev=1737551&r1=1737550&r2=1737551&view=diff ============================================================================== --- incubator/samoa/site/documentation/Vertical-Hoeffding-Tree-Classifier.html (original) +++ incubator/samoa/site/documentation/Vertical-Hoeffding-Tree-Classifier.html Sun Apr 3 08:17:59 2016 @@ -76,27 +76,24 @@ <p>Vertical Hoeffding Tree (VHT) classifier is a distributed classifier that utilizes vertical parallelism on top of the Very Fast Decision Tree (VFDT) or Hoeffding Tree classifier.</p> <h3 id="very-fast-decision-tree-vfdt-classifier">Very Fast Decision Tree (VFDT) classifier</h3> - <p><a href="http://doi.acm.org/10.1145/347090.347107">Hoeffding Tree or VFDT</a> is the standard decision tree algorithm for data stream classification. VFDT uses the Hoeffding bound to decide the minimum number of arriving instances to achieve certain level of confidence in splitting the node. This confidence level determines how close the statistics between the attribute chosen by VFDT and the attribute chosen by decision tree for batch learning.</p> <p>For a more comprehensive summary of VFDT, read chapter 3 of <a href="http://heanet.dl.sourceforge.net/project/moa-datastream/documentation/StreamMining.pdf">Data Stream Mining: A Practical Approach</a>.</p> <h3 id="vertical-parallelism">Vertical Parallelism</h3> +<p>Vertical Parallelism is a parallelism approach which partitions the instances in term of attribute for parallel processing. Vertical-parallelism-based decision tree induction processes the partitioned instances (which consists of subset of attribute) to calculate the information-theoretic criteria in parallel. For example, if we have instances with 100 attributes and we partition the instances into 5 portions, we will have 20 attributes per portion. The algorithm processes the 20 attributes in parallel to determine the âlocalâ best attribute to split and combine the parallel computation results to determine the âglobalâ best attribute to split and grow the tree.</p> -<p>Vertical Parallelism is a parallelism approach which partitions the instances in term of attribute for parallel processing. Vertical-parallelism-based decision tree induction processes the partitioned instances (which consists of subset of attribute) to calculate the information-theoretic criteria in parallel. For example, if we have instances with 100 attributes and we partition the instances into 5 portions, we will have 20 attributes per portion. The algorithm processes the 20 attributes in parallel to determine the "local" best attribute to split and combine the parallel computation results to determine the "global" best attribute to split and grow the tree. </p> - -<p>For more explanation about available parallelism types for decision tree induction, you can read chapter 4 of <a href="../SAMOA-Developers-Guide-0-0-1.pdf">Distributed Decision Tree Learning for Mining Big Data Streams</a>, the Developer's Guide of SAMOA. </p> +<p>For more explanation about available parallelism types for decision tree induction, you can read chapter 4 of <a href="../SAMOA-Developers-Guide-0-0-1.pdf">Distributed Decision Tree Learning for Mining Big Data Streams</a>, the Developerâs Guide of SAMOA.</p> <h3 id="vertical-hoeffding-tree-vht-classifier">Vertical Hoeffding Tree (VHT) classifier</h3> - <p>VHT is implemented using the SAMOA API. The diagram below shows the implementation: -<img src="images/VHT.png" alt="Vertical Hoeffding Tree"></p> +<img src="images/VHT.png" alt="Vertical Hoeffding Tree" /></p> <p>The <em>source Processor</em> and the <em>evaluator Processor</em> are components of the <a href="Prequential-Evaluation-Task">prequential evaluation task</a> in SAMOA. The <em>model-aggregator Processor</em> contains the decision tree model. It connects to <em>local-statistic Processor</em> via <em>attribute</em> stream and <em>control</em> stream. The <em>model-aggregator Processor</em> splits instances based on attribute and each <em>local-statistic Processor</em> contains local statistic for attributes that assigned to it. The <em>model-aggregator Processor</em> sends the split instances via attribute stream and it sends control messages to ask <em>local-statistic Processor</em> to perform computation via <em>control</em> stream. Users configure <em>n</em>, which is the parallelism level of the algorithm. The parallelism level is translated into the number of local-statistic Processors in the algorithm.</p> <p>The <em>model-aggregator Processor</em> sends the classification result via <em>result</em> stream to the <em>evaluator Processor</em> for the corresponding evaluation task or other destination Processor. The <em>evaluator Processor</em> performs an evaluation of the algorithm showing accuracy and throughput. Incoming instances to the <em>model-aggregator Processor</em> arrive via <em>source</em> stream. The calculation results from local statistic arrive to the <em>model-aggregator Processor</em> via <em>computation-result</em> stream.</p> -<p>For more details about the algorithms (i.e. pseudocode), go to section 4.2 of <a href="../SAMOA-Developers-Guide-0-0-1.pdf">Distributed Decision Tree Learning for Mining Big Data Streams</a>, the Developer's Guide of SAMOA. </p> +<p>For more details about the algorithms (i.e. pseudocode), go to section 4.2 of <a href="../SAMOA-Developers-Guide-0-0-1.pdf">Distributed Decision Tree Learning for Mining Big Data Streams</a>, the Developerâs Guide of SAMOA.</p> </article>
