Modified: incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html (original) +++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html Sun Sep 25 20:39:59 2016 @@ -77,313 +77,221 @@ The steps included in this tutorial are:</p> <ol> - <li> - <p>Setup and configure a cluster with the required dependencies. This applies for single-node (local) execution as well.</p> - </li> - <li> - <p>Build SAMOA deployables</p> - </li> - <li> - <p>Configure SAMOA-Samza</p> - </li> - <li> - <p>Deploy SAMOA-Samza and execute a task</p> - </li> - <li> - <p>Observe the execution and the result</p> - </li> +<li><p>Setup and configure a cluster with the required dependencies. This applies for single-node (local) execution as well.</p></li> +<li><p>Build SAMOA deployables</p></li> +<li><p>Configure SAMOA-Samza</p></li> +<li><p>Deploy SAMOA-Samza and execute a task</p></li> +<li><p>Observe the execution and the result</p></li> </ol> <h2 id="setup-cluster">Setup cluster</h2> + <p>The following are needed to to run SAMOA on top of Samza:</p> <ul> - <li><a href="http://zookeeper.apache.org/">Apache Zookeeper</a></li> - <li><a href="http://kafka.apache.org/">Apache Kafka</a></li> - <li><a href="http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html">Apache Hadoop YARN and HDFS</a></li> +<li><a href="http://zookeeper.apache.org/">Apache Zookeeper</a></li> +<li><a href="http://kafka.apache.org/">Apache Kafka</a></li> +<li><a href="http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html">Apache Hadoop YARN and HDFS</a></li> </ul> <h3 id="zookeeper">Zookeeper</h3> -<p>Zookeeper is used by Kafka to coordinate its brokers. The detail instructions to setup a Zookeeper cluster can be found <a href="http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html">here</a>.</p> + +<p>Zookeeper is used by Kafka to coordinate its brokers. The detail instructions to setup a Zookeeper cluster can be found <a href="http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html">here</a>. </p> <p>To quickly setup a single-node Zookeeper cluster:</p> <ol> - <li> - <p>Download the binary release from the <a href="http://zookeeper.apache.org/releases.html">release page</a>.</p> - </li> - <li> - <p>Untar the archive</p> - </li> +<li><p>Download the binary release from the <a href="http://zookeeper.apache.org/releases.html">release page</a>.</p></li> +<li><p>Untar the archive</p></li> </ol> - -<p><code class="highlighter-rouge"> -tar -xf $DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/ -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">tar -xf $DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/ +</code></pre></div> <ol> - <li>Copy the default configuration file</li> +<li>Copy the default configuration file</li> </ol> - -<p><code class="highlighter-rouge"> -cp zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">cp zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg +</code></pre></div> <ol> - <li>Start the single-node cluster</li> +<li>Start the single-node cluster</li> </ol> - -<p><code class="highlighter-rouge"> -~/zookeeper-3.4.6/bin/zkServer.sh start -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">~/zookeeper-3.4.6/bin/zkServer.sh start +</code></pre></div> <h3 id="kafka">Kafka</h3> -<p>Kafka is a distributed, partitioned, replicated commit log service which Samza uses as its default messaging system.</p> + +<p>Kafka is a distributed, partitioned, replicated commit log service which Samza uses as its default messaging system. </p> <ol> - <li> - <p>Download a binary release of Kafka <a href="http://kafka.apache.org/downloads.html">here</a>. As mentioned in the page, the Scala version does not matter. However, 2.10 is recommended as Samza has recently been moved to Scala 2.10.</p> - </li> - <li> - <p>Untar the archive</p> - </li> +<li><p>Download a binary release of Kafka <a href="http://kafka.apache.org/downloads.html">here</a>. As mentioned in the page, the Scala version does not matter. However, 2.10 is recommended as Samza has recently been moved to Scala 2.10.</p></li> +<li><p>Untar the archive </p></li> </ol> - -<p><code class="highlighter-rouge"> -tar -xzf $DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/ -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">tar -xzf $DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/ +</code></pre></div> <p>If you are running in local mode or a single-node cluster, you can now start Kafka with the command:</p> - -<p><code class="highlighter-rouge"> -~/kafka_2.10-0.8.1/bin/kafka-server-start.sh kafka_2.10-0.8.1/config/server.properties -</code></p> - -<p>In multi-node cluster, it is typical and convenient to have a Kafka broker on each node (although you can totally have a smaller Kafka cluster, or even a single-node Kafka cluster). The number of brokers in Kafka cluster will affect disk bandwidth and space (the more brokers we have, the higher value we will get for the two). In each node, you need to set the following properties in <code class="highlighter-rouge">~/kafka_2.10-0.8.1/config/server.properties</code> before starting Kafka service.</p> - -<p><code class="highlighter-rouge"> -broker.id=a-unique-number-for-each-node +<div class="highlight"><pre><code class="language-" data-lang="">~/kafka_2.10-0.8.1/bin/kafka-server-start.sh kafka_2.10-0.8.1/config/server.properties +</code></pre></div> +<p>In multi-node cluster, it is typical and convenient to have a Kafka broker on each node (although you can totally have a smaller Kafka cluster, or even a single-node Kafka cluster). The number of brokers in Kafka cluster will affect disk bandwidth and space (the more brokers we have, the higher value we will get for the two). In each node, you need to set the following properties in <code>~/kafka_2.10-0.8.1/config/server.properties</code> before starting Kafka service.</p> +<div class="highlight"><pre><code class="language-" data-lang="">broker.id=a-unique-number-for-each-node zookeeper.connect=zookeeper-host0-url:2181[,zookeeper-host1-url:2181,...] -</code></p> - +</code></pre></div> <p>You might want to change the retention hours or retention bytes of the logs to avoid the logs size from growing too big.</p> - -<p><code class="highlighter-rouge"> -log.retention.hours=number-of-hours-to-keep-the-logs +<div class="highlight"><pre><code class="language-" data-lang="">log.retention.hours=number-of-hours-to-keep-the-logs log.retention.bytes=number-of-bytes-to-keep-in-the-logs -</code></p> - +</code></pre></div> <h3 id="hadoop-yarn-and-hdfs">Hadoop YARN and HDFS</h3> + <blockquote> - <p>Hadoop YARN and HDFS are <strong>not</strong> required to run SAMOA in Samza local mode.</p> +<p>Hadoop YARN and HDFS are <strong>not</strong> required to run SAMOA in Samza local mode. </p> </blockquote> <p>To set up a YARN cluster, first download a binary release of Hadoop <a href="http://www.apache.org/dyn/closer.cgi/hadoop/common/">here</a> on each node in the cluster and untar the archive -<code class="highlighter-rouge">tar -xf $DOWNLOAD_DIR/hadoop-2.2.0.tar.gz -C ~/</code>. We have tested SAMOA with Hadoop 2.2.0 but Hadoop 2.3.0 should work too.</p> +<code>tar -xf $DOWNLOAD_DIR/hadoop-2.2.0.tar.gz -C ~/</code>. We have tested SAMOA with Hadoop 2.2.0 but Hadoop 2.3.0 should work too.</p> <p><strong>HDFS</strong></p> -<p>Set the following properties in <code class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml</code> in all nodes.</p> - -<p>```</p> -<configuration> - <property> - <name>dfs.datanode.data.dir</name> - <value>file:///home/username/hadoop-2.2.0/hdfs/datanode</value> - <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description> - </property> - - <property> - <name>dfs.namenode.name.dir</name> - <value>file:///home/username/hadoop-2.2.0/hdfs/namenode</value> - <description>Path on the local filesystem where the NameNode stores the namespace and transaction logs persistently.</description> - </property> -</configuration> -<p>```</p> - -<p>Add this property in <code class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop/core-site.xml</code> in all nodes.</p> - -<p>```</p> -<configuration> - <property> - <name>fs.defaultFS</name> - <value>hdfs://localhost:9000/</value> - <description>NameNode URI</description> - </property> - - <property> - <name>fs.hdfs.impl</name> - <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> - </property> -</configuration> -<p>``` -For a multi-node cluster, change the hostname (âlocalhostâ) to the correct host name of your namenode server.</p> +<p>Set the following properties in <code>~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml</code> in all nodes.</p> +<div class="highlight"><pre><code class="language-" data-lang=""><configuration> + <property> + <name>dfs.datanode.data.dir</name> + <value>file:///home/username/hadoop-2.2.0/hdfs/datanode</value> + <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description> + </property> + + <property> + <name>dfs.namenode.name.dir</name> + <value>file:///home/username/hadoop-2.2.0/hdfs/namenode</value> + <description>Path on the local filesystem where the NameNode stores the namespace and transaction logs persistently.</description> + </property> +</configuration> +</code></pre></div> +<p>Add this property in <code>~/hadoop-2.2.0/etc/hadoop/core-site.xml</code> in all nodes.</p> +<div class="highlight"><pre><code class="language-" data-lang=""><configuration> + <property> + <name>fs.defaultFS</name> + <value>hdfs://localhost:9000/</value> + <description>NameNode URI</description> + </property> + + <property> + <name>fs.hdfs.impl</name> + <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> + </property> +</configuration> +</code></pre></div> +<p>For a multi-node cluster, change the hostname ("localhost") to the correct host name of your namenode server.</p> <p>Format HDFS directory (only perform this if you are running it for the very first time)</p> - -<p><code class="highlighter-rouge"> -~/hadoop-2.2.0/bin/hdfs namenode -format -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">~/hadoop-2.2.0/bin/hdfs namenode -format +</code></pre></div> <p>Start namenode daemon on one of the node</p> - -<p><code class="highlighter-rouge"> -~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode +</code></pre></div> <p>Start datanode daemon on all nodes</p> - -<p><code class="highlighter-rouge"> -~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode +</code></pre></div> <p><strong>YARN</strong></p> -<p>If you are running in multi-node cluster, set the resource manager hostname in <code class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop/yarn-site.xml</code> in all nodes as follow:</p> - -<p>```</p> -<configuration> - <property> - <name>yarn.resourcemanager.hostname</name> - <value>resourcemanager-url</value> - <description>The hostname of the RM.</description> - </property> -</configuration> -<p>```</p> - +<p>If you are running in multi-node cluster, set the resource manager hostname in <code>~/hadoop-2.2.0/etc/hadoop/yarn-site.xml</code> in all nodes as follow:</p> +<div class="highlight"><pre><code class="language-" data-lang=""><configuration> + <property> + <name>yarn.resourcemanager.hostname</name> + <value>resourcemanager-url</value> + <description>The hostname of the RM.</description> + </property> +</configuration> +</code></pre></div> <p><strong>Other configurations</strong> Now we need to tell Samza where to find the configuration of YARN cluster. To do this, first create a new directory in all nodes:</p> - -<p><code class="highlighter-rouge"> -mkdir ~/.samza +<div class="highlight"><pre><code class="language-" data-lang="">mkdir ~/.samza mkdir ~/.samza/conf -</code></p> - -<p>Copy (or soft link) <code class="highlighter-rouge">core-site.xml</code>, <code class="highlighter-rouge">hdfs-site.xml</code>, <code class="highlighter-rouge">yarn-site.xml</code> in <code class="highlighter-rouge">~/hadoop-2.2.0/etc/hadoop</code> to the new directory</p> - -<p><code class="highlighter-rouge"> -ln -s ~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml +</code></pre></div> +<p>Copy (or soft link) <code>core-site.xml</code>, <code>hdfs-site.xml</code>, <code>yarn-site.xml</code> in <code>~/hadoop-2.2.0/etc/hadoop</code> to the new directory </p> +<div class="highlight"><pre><code class="language-" data-lang="">ln -s ~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml ln -s ~/.samza/conf/hdfs-site.xml ~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml ln -s ~/.samza/conf/yarn-site.xml ~/hadoop-2.2.0/etc/hadoop/yarn-site.xml -</code></p> - +</code></pre></div> <p>Export the enviroment variable YARN_HOME (in ~/.bashrc) so Samza knows where to find these YARN configuration files.</p> - -<p><code class="highlighter-rouge"> -export YARN_HOME=$HOME/.samza -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">export YARN_HOME=$HOME/.samza +</code></pre></div> <p><strong>Start the YARN cluster</strong> Start resource manager on master node</p> - -<p><code class="highlighter-rouge"> -~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager +</code></pre></div> <p>Start node manager on all worker nodes</p> - -<p><code class="highlighter-rouge"> -~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager +</code></pre></div> <h2 id="build-samoa">Build SAMOA</h2> + <p>Perform the following step on one of the node in the cluster. Here we assume git and maven are installed on this node.</p> <p>Since Samza is not yet released on Maven, we will have to clone Samza project, build and publish to Maven local repository:</p> - -<p><code class="highlighter-rouge"> -git clone -b 0.7.0 https://github.com/apache/incubator-samza.git +<div class="highlight"><pre><code class="language-" data-lang="">git clone -b 0.7.0 https://github.com/apache/incubator-samza.git cd incubator-samza ./gradlew clean build ./gradlew publishToMavenLocal -</code></p> - -<p>Here we cloned and installed Samza version 0.7.0, the current released version (July 2014).</p> +</code></pre></div> +<p>Here we cloned and installed Samza version 0.7.0, the current released version (July 2014). </p> <p>Now we can clone the repository and install SAMOA.</p> - -<p><code class="highlighter-rouge"> -git clone http://git.apache.org/incubator-samoa.git +<div class="highlight"><pre><code class="language-" data-lang="">git clone http://git.apache.org/incubator-samoa.git cd incubator-samoa mvn -Psamza package -</code></p> - -<p>The deployable jars for SAMOA will be in <code class="highlighter-rouge">target/SAMOA-<variant>-<version>-SNAPSHOT.jar</code>. For example, in our case for Samza <code class="highlighter-rouge">target/SAMOA-Samza-0.2.0-SNAPSHOT.jar</code>.</p> +</code></pre></div> +<p>The deployable jars for SAMOA will be in <code>target/SAMOA-<variant>-<version>-SNAPSHOT.jar</code>. For example, in our case for Samza <code>target/SAMOA-Samza-0.2.0-SNAPSHOT.jar</code>.</p> <h2 id="configure-samoa-samza-execution">Configure SAMOA-Samza execution</h2> -<p>This section explains the configuration parameters in <code class="highlighter-rouge">bin/samoa-samza.properties</code> that are required to run SAMOA on top of Samza.</p> -<p><strong>Samza execution mode</strong></p> +<p>This section explains the configuration parameters in <code>bin/samoa-samza.properties</code> that are required to run SAMOA on top of Samza.</p> -<p><code class="highlighter-rouge"> -samoa.samza.mode=[yarn|local] -</code> -This parameter specify which mode to execute the task: <code class="highlighter-rouge">local</code> for local execution and <code class="highlighter-rouge">yarn</code> for cluster execution.</p> +<p><strong>Samza execution mode</strong></p> +<div class="highlight"><pre><code class="language-" data-lang="">samoa.samza.mode=[yarn|local] +</code></pre></div> +<p>This parameter specify which mode to execute the task: <code>local</code> for local execution and <code>yarn</code> for cluster execution.</p> <p><strong>Zookeeper</strong></p> - -<p><code class="highlighter-rouge"> -zookeeper.connect=localhost +<div class="highlight"><pre><code class="language-" data-lang="">zookeeper.connect=localhost zookeeper.port=2181 -</code> -The default setting above applies for local mode execution. For cluster mode, change <code class="highlighter-rouge">zookeeper.host</code> to the correct URL of your zookeeper host.</p> +</code></pre></div> +<p>The default setting above applies for local mode execution. For cluster mode, change <code>zookeeper.host</code> to the correct URL of your zookeeper host.</p> <p><strong>Kafka</strong></p> +<div class="highlight"><pre><code class="language-" data-lang="">kafka.broker.list=localhost:9092 +</code></pre></div> +<p><code>kafka.broker.list</code> is a comma separated list of host:port of all the brokers in Kafka cluster.</p> +<div class="highlight"><pre><code class="language-" data-lang="">kafka.replication.factor=1 +</code></pre></div> +<p><code>kafka.replication.factor</code> specifies the number of replicas for each stream in Kafka. This number must be less than or equal to the number of brokers in Kafka cluster.</p> -<p><code class="highlighter-rouge"> -kafka.broker.list=localhost:9092 -</code> -<code class="highlighter-rouge">kafka.broker.list</code> is a comma separated list of host:port of all the brokers in Kafka cluster.</p> - -<p><code class="highlighter-rouge"> -kafka.replication.factor=1 -</code> -<code class="highlighter-rouge">kafka.replication.factor</code> specifies the number of replicas for each stream in Kafka. This number must be less than or equal to the number of brokers in Kafka cluster.</p> - -<p><strong>YARN</strong> -> The below settings do not apply for local mode execution, you can leave them as they are.</p> +<p><strong>YARN</strong></p> -<p><code class="highlighter-rouge">yarn.am.memory</code> and <code class="highlighter-rouge">yarn.container.memory</code> specify the memory requirement for the Application Master container and the worker containers, respectively.</p> +<blockquote> +<p>The below settings do not apply for local mode execution, you can leave them as they are.</p> +</blockquote> -<p><code class="highlighter-rouge"> -yarn.am.memory=1024 +<p><code>yarn.am.memory</code> and <code>yarn.container.memory</code> specify the memory requirement for the Application Master container and the worker containers, respectively. </p> +<div class="highlight"><pre><code class="language-" data-lang="">yarn.am.memory=1024 yarn.container.memory=1024 -</code></p> - -<p><code class="highlighter-rouge">yarn.package.path</code> specifies the path (typically a HDFS path) of the package to be distributed to all YARN containers to execute the task.</p> - -<p><code class="highlighter-rouge"> -yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar -</code></p> - +</code></pre></div> +<p><code>yarn.package.path</code> specifies the path (typically a HDFS path) of the package to be distributed to all YARN containers to execute the task.</p> +<div class="highlight"><pre><code class="language-" data-lang="">yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar +</code></pre></div> <p><strong>Samza</strong> -<code class="highlighter-rouge">max.pi.per.container</code> specifies the number of PI instances allowed in one YARN container.</p> - -<p><code class="highlighter-rouge"> -max.pi.per.container=1 -</code></p> - -<p><code class="highlighter-rouge">kryo.register.file</code> specifies the registration file for Kryo serializer.</p> - -<p><code class="highlighter-rouge"> -kryo.register.file=samza-kryo -</code></p> - -<p><code class="highlighter-rouge">checkpoint.commit.ms</code> specifies the frequency for PIs to commit their checkpoints (in ms). The default value is 1 minute.</p> - -<p><code class="highlighter-rouge"> -checkpoint.commit.ms=60000 -</code></p> - +<code>max.pi.per.container</code> specifies the number of PI instances allowed in one YARN container. </p> +<div class="highlight"><pre><code class="language-" data-lang="">max.pi.per.container=1 +</code></pre></div> +<p><code>kryo.register.file</code> specifies the registration file for Kryo serializer.</p> +<div class="highlight"><pre><code class="language-" data-lang="">kryo.register.file=samza-kryo +</code></pre></div> +<p><code>checkpoint.commit.ms</code> specifies the frequency for PIs to commit their checkpoints (in ms). The default value is 1 minute.</p> +<div class="highlight"><pre><code class="language-" data-lang="">checkpoint.commit.ms=60000 +</code></pre></div> <h2 id="deploy-samoa-samza-task">Deploy SAMOA-Samza task</h2> -<p>Execute SAMOA task with the following command:</p> - -<p><code class="highlighter-rouge"> -bin/samoa samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar "<task> & <options>" -</code></p> +<p>Execute SAMOA task with the following command:</p> +<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar "<task> & <options>" +</code></pre></div> <h2 id="observe-execution-and-result">Observe execution and result</h2> -<p>In local mode, all the log will be printed out to stdout. If you execute the task on YARN cluster, the output is written to stdout files in YARNâs containersâ log folder ($HADOOP_HOME/logs/userlogs/application_<application-id>/container_<container-id>).</p> + +<p>In local mode, all the log will be printed out to stdout. If you execute the task on YARN cluster, the output is written to stdout files in YARN's containers' log folder ($HADOOP_HOME/logs/userlogs/application_<application-id>/container_<container-id>).</p> </article>
Modified: incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html (original) +++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html Sun Sep 25 20:39:59 2016 @@ -76,104 +76,103 @@ <p>In this tutorial page we describe how to execute SAMOA on top of Apache Storm. Here is an outline of what we want to do:</p> <ol> - <li>Ensure that you have necessary Storm cluster and configuration to execute SAMOA</li> - <li>Ensure that you have all the SAMOA deployables for execution in the cluster</li> - <li>Configure samoa-storm.properties</li> - <li>Execute SAMOA classification task</li> - <li>Observe the task execution</li> +<li>Ensure that you have necessary Storm cluster and configuration to execute SAMOA</li> +<li>Ensure that you have all the SAMOA deployables for execution in the cluster</li> +<li>Configure samoa-storm.properties</li> +<li>Execute SAMOA classification task</li> +<li>Observe the task execution</li> </ol> <h3 id="storm-configuration">Storm Configuration</h3> -<p>Before we start the tutorial, please ensure that you already have Storm cluster (preferably Storm 0.8.2) running. You can follow this <a href="http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/">tutorial</a> to set up a Storm cluster.</p> -<p>You also need to install Storm at the machine where you initiate the deployment, and configure Storm (at least) with this configuration in <code class="highlighter-rouge">~/.storm/storm.yaml</code>:</p> +<p>Before we start the tutorial, please ensure that you already have Storm cluster (preferably Storm 0.8.2) running. You can follow this <a href="http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/">tutorial</a> to set up a Storm cluster.</p> -<p>``` -########### These MUST be filled in for a storm configuration -nimbus.host: â<enter your="" nimbus="" host="" name="" here="">"</enter></p> +<p>You also need to install Storm at the machine where you initiate the deployment, and configure Storm (at least) with this configuration in <code>~/.storm/storm.yaml</code>:</p> +<div class="highlight"><pre><code class="language-" data-lang="">########### These MUST be filled in for a storm configuration +nimbus.host: "<enter your nimbus host name here>" -<h2 id="list-of-custom-serializations">List of custom serializations</h2> -<p>kryo.register: +## List of custom serializations +kryo.register: - org.apache.samoa.learners.classifiers.trees.AttributeContentEvent: org.apache.samoa.learners.classifiers.trees.AttributeContentEvent$AttributeCEFullPrecSerializer - org.apache.samoa.learners.classifiers.trees.ComputeContentEvent: org.apache.samoa.learners.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer -<code class="highlighter-rouge"> -<!-- +</code></pre></div> +<!-- Or, if you are using SAMOA with optimized VHT, you should use this following configuration file: -</code> +``` ########### These MUST be filled in for a storm configuration -nimbus.host: â<enter your="" nimbus="" host="" name="" here="">"</enter></p> +nimbus.host: "<enter your nimbus host name here>" -<h2 id="list-of-custom-serializations-1">List of custom serializations</h2> -<p>kryo.register: +## List of custom serializations +kryo.register: - org.apache.samoa.learners.classifiers.trees.NaiveAttributeContentEvent: org.apache.samoa.classifiers.trees.NaiveAttributeContentEvent$NaiveAttributeCEFullPrecSerializer - org.apache.samoa.learners.classifiers.trees.ComputeContentEvent: org.apache.samoa.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer ``` -â></p> +--> -<p>Alternatively, if you donât have Storm cluster running, you can execute SAMOA with Storm in local mode as explained in section <a href="#samoa-storm-properties">samoa-storm.properties Configuration</a>.</p> +<p>Alternatively, if you don't have Storm cluster running, you can execute SAMOA with Storm in local mode as explained in section <a href="#samoa-storm-properties">samoa-storm.properties Configuration</a>.</p> <h3 id="samoa-deployables">SAMOA deployables</h3> + <p>There are three deployables for executing SAMOA on top of Storm. They are:</p> <ol> - <li><code class="highlighter-rouge">bin/samoa</code> is the main script to execute SAMOA. You do not need to change anything in this script.</li> - <li><code class="highlighter-rouge">target/SAMOA-Storm-x.x.x-SNAPSHOT.jar</code> is the deployed jar file. <code class="highlighter-rouge">x.x.x</code> is the version number of SAMOA.</li> - <li><code class="highlighter-rouge">bin/samoa-storm.properties</code> contains deployment configurations. You need to set the parameters in this properties file correctly.</li> +<li><code>bin/samoa</code> is the main script to execute SAMOA. You do not need to change anything in this script.</li> +<li><code>target/SAMOA-Storm-x.x.x-SNAPSHOT.jar</code> is the deployed jar file. <code>x.x.x</code> is the version number of SAMOA. </li> +<li><code>bin/samoa-storm.properties</code> contains deployment configurations. You need to set the parameters in this properties file correctly. </li> </ol> -<h3 id="a-namesamoa-storm-properties-samoa-stormproperties-configurationa"><a name="samoa-storm-properties"> samoa-storm.properties Configuration</a></h3> +<h3 id="samoa-storm-properties-configuration"><a name="samoa-storm-properties"> samoa-storm.properties Configuration</a></h3> + <p>Currently, the properties file contains two configurations:</p> <ol> - <li><code class="highlighter-rouge">samoa.storm.mode</code> determines whether the task is executed locally (using Stormâs <code class="highlighter-rouge">LocalCluster</code>) or executed in a Storm cluster. Use <code class="highlighter-rouge">local</code> if you want to test SAMOA and you do not have a Storm cluster for deployment. Use <code class="highlighter-rouge">cluster</code> if you want to test SAMOA on your Storm cluster.</li> - <li><code class="highlighter-rouge">samoa.storm.numworker</code> determines the number of worker to execute the SAMOA tasks in the Storm cluster. This field must be an integer, less than or equal to the number of available slots in you Storm cluster. If you are using local mode, this property corresponds to the number of thread used by Stormâs LocalCluster to execute your SAMOA task.</li> +<li><code>samoa.storm.mode</code> determines whether the task is executed locally (using Storm's <code>LocalCluster</code>) or executed in a Storm cluster. Use <code>local</code> if you want to test SAMOA and you do not have a Storm cluster for deployment. Use <code>cluster</code> if you want to test SAMOA on your Storm cluster.</li> +<li><code>samoa.storm.numworker</code> determines the number of worker to execute the SAMOA tasks in the Storm cluster. This field must be an integer, less than or equal to the number of available slots in you Storm cluster. If you are using local mode, this property corresponds to the number of thread used by Storm's LocalCluster to execute your SAMOA task.</li> </ol> <p>Here is the example of a complete properties file:</p> - -<p>``` -# SAMOA Storm properties file +<div class="highlight"><pre><code class="language-" data-lang=""># SAMOA Storm properties file # This file contains specific configurations for SAMOA deployment in the Storm platform # Note that you still need to configure Storm client in your machine, -# including setting up Storm configuration file (~/.storm/storm.yaml) with correct settings</p> +# including setting up Storm configuration file (~/.storm/storm.yaml) with correct settings -<h1 id="samoastormmode-corresponds-to-the-execution-mode-of-the-task-in-storm">samoa.storm.mode corresponds to the execution mode of the Task in Storm</h1> -<p># possible values: +# samoa.storm.mode corresponds to the execution mode of the Task in Storm +# possible values: # 1. cluster: the Task will be sent into nimbus. The nimbus is configured by Storm configuration file # 2. local: the Task will be sent using local Storm cluster -samoa.storm.mode=cluster</p> +samoa.storm.mode=cluster -<h1 id="samoastormnumworker-corresponds-to-the-number-of-worker-processes-allocated-in-storm-cluster">samoa.storm.numworker corresponds to the number of worker processes allocated in Storm cluster</h1> -<p># possible values: any integer greater than 0<br /> +# samoa.storm.numworker corresponds to the number of worker processes allocated in Storm cluster +# possible values: any integer greater than 0 samoa.storm.numworker=7 -```</p> - +</code></pre></div> <h3 id="samoa-task-execution">SAMOA task execution</h3> -<p>You can execute a SAMOA task using the aforementioned <code class="highlighter-rouge">bin/samoa</code> script with this following format: -<code class="highlighter-rouge">bin/samoa <platform> <jar> "<task>"</code>.</p> +<p>You can execute a SAMOA task using the aforementioned <code>bin/samoa</code> script with this following format: +<code>bin/samoa <platform> <jar> "<task>"</code>.</p> -<p><code class="highlighter-rouge"><platform></code> can be <code class="highlighter-rouge">storm</code> or <code class="highlighter-rouge">s4</code>. Using <code class="highlighter-rouge">storm</code> option means you are deploying SAMOA on a Storm environment. In this configuration, the script uses the aforementioned yaml file (<code class="highlighter-rouge">~/.storm/storm.yaml</code>) and <code class="highlighter-rouge">samoa-storm.properties</code> to perform the deployment. Using <code class="highlighter-rouge">s4</code> option means you are deploying SAMOA on an Apache S4 environment. Follow this <a href="Executing-SAMOA-with-Apache-S4">link</a> to learn more about deploying SAMOA on Apache S4.</p> +<p><code><platform></code> can be <code>storm</code> or <code>s4</code>. Using <code>storm</code> option means you are deploying SAMOA on a Storm environment. In this configuration, the script uses the aforementioned yaml file (<code>~/.storm/storm.yaml</code>) and <code>samoa-storm.properties</code> to perform the deployment. Using <code>s4</code> option means you are deploying SAMOA on an Apache S4 environment. Follow this <a href="Executing-SAMOA-with-Apache-S4">link</a> to learn more about deploying SAMOA on Apache S4.</p> -<p><code class="highlighter-rouge"><jar></code> is the location of the deployed jar file (<code class="highlighter-rouge">SAMOA-Storm-x.x.x-SNAPSHOT.jar</code>) in your file system. The location can be a relative path or an absolute path into the jar file.</p> +<p><code><jar></code> is the location of the deployed jar file (<code>SAMOA-Storm-x.x.x-SNAPSHOT.jar</code>) in your file system. The location can be a relative path or an absolute path into the jar file. </p> -<p><code class="highlighter-rouge">"<task>"</code> is the SAMOA task command line such as <code class="highlighter-rouge">PrequentialEvaluation</code> or <code class="highlighter-rouge">ClusteringTask</code>. This command line for SAMOA task follows the format of <a href="http://moa.cms.waikato.ac.nz/details/classification/command-line/">Massive Online Analysis (MOA)</a>.</p> +<p><code>"<task>"</code> is the SAMOA task command line such as <code>PrequentialEvaluation</code> or <code>ClusteringTask</code>. This command line for SAMOA task follows the format of <a href="http://moa.cms.waikato.ac.nz/details/classification/command-line/">Massive Online Analysis (MOA)</a>.</p> <p>The complete command to execute SAMOA is:</p> - -<p><code class="highlighter-rouge"> -bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l (org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s (org.apache.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u 10)" -</code> -The example above uses <a href="Prequential-Evaluation-Task">Prequential Evaluation task</a> and <a href="Vertical-Hoeffding-Tree-Classifier">Vertical Hoeffding Tree</a> classifier.</p> +<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l (org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s (org.apache.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u 10)" +</code></pre></div> +<p>The example above uses <a href="Prequential-Evaluation-Task">Prequential Evaluation task</a> and <a href="Vertical-Hoeffding-Tree-Classifier">Vertical Hoeffding Tree</a> classifier. </p> <h3 id="observing-task-execution">Observing task execution</h3> -<p>There are two ways to observe the task execution using Storm UI and by monitoring the dump file of the SAMOA task. Notice that the dump file will be created on the cluster if you are executing your task in <code class="highlighter-rouge">cluster</code> mode.</p> + +<p>There are two ways to observe the task execution using Storm UI and by monitoring the dump file of the SAMOA task. Notice that the dump file will be created on the cluster if you are executing your task in <code>cluster</code> mode.</p> <h4 id="using-storm-ui">Using Storm UI</h4> + <p>Go to the web address of Storm UI and check whether the SAMOA task executes as intended. Use this UI to kill the associated Storm topology if necessary.</p> <h4 id="monitoring-the-dump-file">Monitoring the dump file</h4> -<p>Several tasks have options to specify a dump file, which is a file that represents the task output. In our example, <a href="Prequential-Evaluation-Task">Prequential Evaluation task</a> has <code class="highlighter-rouge">-d</code> option which specifies the path to the dump file. Since Storm performs the allocation of Storm tasks, you should set the dump file into a file on a shared filesystem if you want to access it from the machine submitting the task.</p> + +<p>Several tasks have options to specify a dump file, which is a file that represents the task output. In our example, <a href="Prequential-Evaluation-Task">Prequential Evaluation task</a> has <code>-d</code> option which specifies the path to the dump file. Since Storm performs the allocation of Storm tasks, you should set the dump file into a file on a shared filesystem if you want to access it from the machine submitting the task.</p> </article> Modified: incubator/samoa/site/documentation/Getting-Started.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Getting-Started.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Getting-Started.html (original) +++ incubator/samoa/site/documentation/Getting-Started.html Sun Sep 25 20:39:59 2016 @@ -76,40 +76,26 @@ <p>We start showing how simple is to run a first large scale machine learning task in SAMOA. We will evaluate a bagging ensemble method using decision trees on the Forest Covertype dataset.</p> <ul> - <li> - <ol> - <li>Download SAMOA</li> - </ol> - </li> +<li>1. Download SAMOA </li> </ul> - -<p><code class="highlighter-rouge">bash -git clone http://git.apache.org/incubator-samoa.git -cd incubator-samoa -mvn package #Local mode -</code> -* 2. Download the Forest CoverType dataset</p> - -<p><code class="highlighter-rouge">bash -wget "http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip" +<div class="highlight"><pre><code class="language-bash" data-lang="bash">git clone http://git.apache.org/incubator-samoa.git +<span class="nb">cd </span>incubator-samoa +mvn package <span class="c">#Local mode</span> +</code></pre></div> +<ul> +<li>2. Download the Forest CoverType dataset </li> +</ul> +<div class="highlight"><pre><code class="language-bash" data-lang="bash">wget <span class="s2">"http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip"</span> unzip covtypeNorm.arff.zip -</code></p> - +</code></pre></div> <p><em>Forest Covertype</em> contains the forest cover type for 30 x 30 meter cells obtained from the US Forest Service (USFS) Region 2 Resource Information System (RIS) data. It contains 581,012 instances and 54 attributes, and it has been used in several articles on data stream classification.</p> <ul> - <li> - <ol> - <li>Run an example: classifying the CoverType dataset with the bagging algorithm</li> - </ol> - </li> +<li>3. Run an example: classifying the CoverType dataset with the bagging algorithm</li> </ul> - -<p><code class="highlighter-rouge">bash -bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar "PrequentialEvaluation -l classifiers.ensemble.Bagging - -s (ArffFileStream -f covtypeNorm.arff) -f 100000" -</code></p> - +<div class="highlight"><pre><code class="language-bash" data-lang="bash">bin/samoa <span class="nb">local </span>target/SAMOA-Local-0.3.0-SNAPSHOT.jar <span class="s2">"PrequentialEvaluation -l classifiers.ensemble.Bagging + -s (ArffFileStream -f covtypeNorm.arff) -f 100000"</span> +</code></pre></div> <p>The output will be a list of the evaluation results, plotted each 100,000 instances.</p> </article> Modified: incubator/samoa/site/documentation/Home.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Home.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Home.html (original) +++ incubator/samoa/site/documentation/Home.html Sun Sep 25 20:39:59 2016 @@ -81,62 +81,58 @@ SAMOA is similar to Mahout in spirit, bu <p>Apache SAMOA is simple and fun to use! This documentation is intended to give an introduction on how to use SAMOA in different ways. As a user you can run SAMOA algorithms on several stream processing engines: local mode, Storm, S4, Samza, and Flink. As a developer you can create new algorithms only once and test them in all of these distributed stream processing engines.</p> <h2 id="getting-started">Getting Started</h2> + <ul> - <li><a href="Getting-Started.html">0 Hands-on with SAMOA: Getting Started!</a></li> +<li><a href="Getting-Started.html">0 Hands-on with SAMOA: Getting Started!</a></li> </ul> <h2 id="users">Users</h2> + +<ul> +<li><a href="Scalable-Advanced-Massive-Online-Analysis.html">1 Building and Executing SAMOA</a> + +<ul> +<li><a href="Building-SAMOA.html">1.0 Building SAMOA</a></li> +<li><a href="Executing-SAMOA-with-Apache-Storm.html">1.1 Executing SAMOA with Apache Storm</a></li> +<li><a href="Executing-SAMOA-with-Apache-S4.html">1.2 Executing SAMOA with Apache S4</a></li> +<li><a href="Executing-SAMOA-with-Apache-Samza.html">1.3 Executing SAMOA with Apache Samza</a></li> +<li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">1.4 Executing SAMOA with Apache Avro Files</a></li> +</ul></li> +<li><a href="SAMOA-and-Machine-Learning.html">2 Machine Learning Methods in SAMOA</a> + <ul> - <li><a href="Scalable-Advanced-Massive-Online-Analysis.html">1 Building and Executing SAMOA</a> - <ul> - <li><a href="Building-SAMOA.html">1.0 Building SAMOA</a></li> - <li><a href="Executing-SAMOA-with-Apache-Storm.html">1.1 Executing SAMOA with Apache Storm</a></li> - <li><a href="Executing-SAMOA-with-Apache-S4.html">1.2 Executing SAMOA with Apache S4</a></li> - <li><a href="Executing-SAMOA-with-Apache-Samza.html">1.3 Executing SAMOA with Apache Samza</a></li> - <li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">1.4 Executing SAMOA with Apache Avro Files</a></li> - </ul> - </li> - <li><a href="SAMOA-and-Machine-Learning.html">2 Machine Learning Methods in SAMOA</a> - <ul> - <li><a href="Prequential-Evaluation-Task.html">2.1 Prequential Evaluation Task</a></li> - <li><a href="Vertical-Hoeffding-Tree-Classifier.html">2.2 Vertical Hoeffding Tree Classifier</a></li> - <li><a href="Adaptive-Model-Rules-Regressor.html">2.3 Adaptive Model Rules Regressor</a></li> - <li><a href="Bagging-and-Boosting.html">2.4 Bagging and Boosting</a></li> - <li><a href="Distributed-Stream-Clustering.html">2.5 Distributed Stream Clustering</a></li> - <li><a href="Distributed-Stream-Frequent-Itemset-Mining.html">2.6 Distributed Stream Frequent Itemset Mining</a></li> - <li><a href="SAMOA-for-MOA-users.html">2.7 SAMOA for MOA users</a></li> - </ul> - </li> +<li><a href="Prequential-Evaluation-Task.html">2.1 Prequential Evaluation Task</a></li> +<li><a href="Vertical-Hoeffding-Tree-Classifier.html">2.2 Vertical Hoeffding Tree Classifier</a></li> +<li><a href="Adaptive-Model-Rules-Regressor.html">2.3 Adaptive Model Rules Regressor</a></li> +<li><a href="Bagging-and-Boosting.html">2.4 Bagging and Boosting</a></li> +<li><a href="Distributed-Stream-Clustering.html">2.5 Distributed Stream Clustering</a></li> +<li><a href="Distributed-Stream-Frequent-Itemset-Mining.html">2.6 Distributed Stream Frequent Itemset Mining</a></li> +<li><a href="SAMOA-for-MOA-users.html">2.7 SAMOA for MOA users</a></li> +</ul></li> </ul> <h2 id="developers">Developers</h2> + <ul> - <li><a href="SAMOA-Topology.html">3 Understanding SAMOA Topologies</a> - <ul> - <li><a href="Processor.html">3.1 Processor</a></li> - <li><a href="Content-Event.html">3.2 Content Event</a></li> - <li><a href="Stream.html">3.3 Stream</a></li> - <li><a href="Task.html">3.4 Task</a></li> - <li><a href="Topology-Builder.html">3.5 Topology Builder</a></li> - <li><a href="Learner.html">3.6 Learner</a></li> - <li><a href="Processing-Item.html">3.7 Processing Item</a></li> - </ul> - </li> - <li><a href="Developing-New-Tasks-in-SAMOA.html">4 Developing New Tasks in SAMOA</a></li> +<li><a href="SAMOA-Topology.html">3 Understanding SAMOA Topologies</a> + +<ul> +<li><a href="Processor.html">3.1 Processor</a></li> +<li><a href="Content-Event.html">3.2 Content Event</a></li> +<li><a href="Stream.html">3.3 Stream</a></li> +<li><a href="Task.html">3.4 Task</a></li> +<li><a href="Topology-Builder.html">3.5 Topology Builder</a></li> +<li><a href="Learner.html">3.6 Learner</a></li> +<li><a href="Processing-Item.html">3.7 Processing Item</a></li> +</ul></li> +<li><a href="Developing-New-Tasks-in-SAMOA.html">4 Developing New Tasks in SAMOA</a></li> </ul> <h3 id="getting-help">Getting help</h3> -<p>Discussion about SAMOA happens on the Apache development mailing list <a href="mailto:dev@samoa.incubator.org">dev@samoa.incubator.org</a></p> -<table> - <tbody> - <tr> - <td>[ <a href="mailto:dev-subscribe@samoa.incubator.org">subscribe</a></td> - <td><a href="mailto:dev-unsubscribe@samoa.incubator.org">unsubscribe</a></td> - <td><a href="http://mail-archives.apache.org/mod_mbox/incubator-samoa-dev">archives</a> ]</td> - </tr> - </tbody> -</table> +<p>Discussion about SAMOA happens on the Apache development mailing list <a href="mailto:[email protected]">[email protected]</a></p> + +<p>[ <a href="mailto:[email protected]">subscribe</a> | <a href="mailto:[email protected]">unsubscribe</a> | <a href="http://mail-archives.apache.org/mod_mbox/incubator-samoa-dev">archives</a> ]</p> </article> Modified: incubator/samoa/site/documentation/Learner.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Learner.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Learner.html (original) +++ incubator/samoa/site/documentation/Learner.html Sun Sep 25 20:39:59 2016 @@ -74,19 +74,18 @@ <article class="post-content"> <p>Learners are implemented in SAMOA as sub-topologies.</p> +<div class="highlight"><pre><code class="language-" data-lang="">public interface Learner extends Serializable{ -<p>``` -public interface Learner extends Serializable{</p> + public void init(TopologyBuilder topologyBuilder, Instances dataset); -<div class="highlighter-rouge"><pre class="highlight"><code>public void init(TopologyBuilder topologyBuilder, Instances dataset); + public Processor getInputProcessor(); -public Processor getInputProcessor(); + public Stream getResultStream(); +} +</code></pre></div> +<p>When a <code>Task</code> object is initiated via <code>init()</code>, the method <code>init(...)</code> of <code>Learner</code> is called, and the topology is added to the global topology of the task.</p> -public Stream getResultStream(); } ``` When a `Task` object is initiated via `init()`, the method `init(...)` of `Learner` is called, and the topology is added to the global topology of the task. -</code></pre> -</div> - -<p>To create a new learner, it is only needed to add streams, processors and their connections to the topology in <code class="highlighter-rouge">init(...)</code>, specify what is the processor that will manage the input stream of the learner in <code class="highlighter-rouge">getInputProcessor()</code>, and finally, specify what is going to be the output stream of the learner with <code class="highlighter-rouge">getResultStream()</code>.</p> +<p>To create a new learner, it is only needed to add streams, processors and their connections to the topology in <code>init(...)</code>, specify what is the processor that will manage the input stream of the learner in <code>getInputProcessor()</code>, and finally, specify what is going to be the output stream of the learner with <code>getResultStream()</code>.</p> </article> Modified: incubator/samoa/site/documentation/Prequential-Evaluation-Task.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Prequential-Evaluation-Task.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Prequential-Evaluation-Task.html (original) +++ incubator/samoa/site/documentation/Prequential-Evaluation-Task.html Sun Sep 25 20:39:59 2016 @@ -73,29 +73,26 @@ </header> <article class="post-content"> - <p>In data stream mining, the most used evaluation scheme is the prequential or interleaved-test-then-train evolution. The idea is very simple: we use each instance first to test the model, and then to train the model. The Prequential Evaluation task evaluates the performance of online classifiers doing this. It supports two classification performance evaluators: the basic one which measures the accuracy of the classifier model since the start of the evaluation, and a window based one which measures the accuracy on the current sliding window of recent instances.</p> + <p>In data stream mining, the most used evaluation scheme is the prequential or interleaved-test-then-train evolution. The idea is very simple: we use each instance first to test the model, and then to train the model. The Prequential Evaluation task evaluates the performance of online classifiers doing this. It supports two classification performance evaluators: the basic one which measures the accuracy of the classifier model since the start of the evaluation, and a window based one which measures the accuracy on the current sliding window of recent instances. </p> <p>Examples of Prequential Evaluation task in SAMOA command line when deploying into Storm</p> - -<p><code class="highlighter-rouge"> -bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l (classifiers.trees.VerticalHoeffdingTree -p 4) -s (generators.RandomTreeGenerator -c 2 -o 10 -u 10)" -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l (classifiers.trees.VerticalHoeffdingTree -p 4) -s (generators.RandomTreeGenerator -c 2 -o 10 -u 10)" +</code></pre></div> <p>Parameters:</p> <ul> - <li><code class="highlighter-rouge">-l</code>: classifier to train</li> - <li><code class="highlighter-rouge">-s</code>: stream to learn from</li> - <li><code class="highlighter-rouge">-e</code>: classification performance evaluation method</li> - <li><code class="highlighter-rouge">-i</code>: maximum number of instances to test/train on (-1 = no limit)</li> - <li><code class="highlighter-rouge">-f</code>: number of instances between samples of the learning performance</li> - <li><code class="highlighter-rouge">-n</code>: evaluation name (default: PrequentialEvaluation_TimeStamp)</li> - <li><code class="highlighter-rouge">-d</code>: file to append intermediate csv results to</li> +<li><code>-l</code>: classifier to train</li> +<li><code>-s</code>: stream to learn from</li> +<li><code>-e</code>: classification performance evaluation method</li> +<li><code>-i</code>: maximum number of instances to test/train on (-1 = no limit)</li> +<li><code>-f</code>: number of instances between samples of the learning performance</li> +<li><code>-n</code>: evaluation name (default: PrequentialEvaluation_TimeStamp)</li> +<li><code>-d</code>: file to append intermediate csv results to</li> </ul> -<p>In terms of SAMOA API, the Prequential Evaluation Task consists of a source <code class="highlighter-rouge">Entrance Processor</code>, a <code class="highlighter-rouge">Classifier</code>, and an <code class="highlighter-rouge">Evaluator Processor</code> as shown below. The <code class="highlighter-rouge">Entrance Processor</code> sends instances to the <code class="highlighter-rouge">Classifier</code> using the <code class="highlighter-rouge">source</code> stream. The classifier sends the classification results to the <code class="highlighter-rouge">Evaluator Processor</code> via the <code class="highlighter-rouge">result</code> stream. The <code class="highlighter-rouge">Entrance Processor</code> corresponds to the <code class="highlighter-rouge">-s</code> option of Prequential Evaluation, the <code class="highlighter-rouge">Classifier</code> corresponds to the <code class="highlighter-rouge">-l</code> option, and the <code class="highlighter-rouge">Evaluator Processor</code> co rresponds to the <code class="highlighter-rouge">-e</code> option.</p> +<p>In terms of SAMOA API, the Prequential Evaluation Task consists of a source <code>Entrance Processor</code>, a <code>Classifier</code>, and an <code>Evaluator Processor</code> as shown below. The <code>Entrance Processor</code> sends instances to the <code>Classifier</code> using the <code>source</code> stream. The classifier sends the classification results to the <code>Evaluator Processor</code> via the <code>result</code> stream. The <code>Entrance Processor</code> corresponds to the <code>-s</code> option of Prequential Evaluation, the <code>Classifier</code> corresponds to the <code>-l</code> option, and the <code>Evaluator Processor</code> corresponds to the <code>-e</code> option.</p> -<p><img src="images/PrequentialEvaluation.png" alt="Prequential Evaluation Task" /></p> +<p><img src="images/PrequentialEvaluation.png" alt="Prequential Evaluation Task"></p> </article> Modified: incubator/samoa/site/documentation/Processing-Item.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Processing-Item.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Processing-Item.html (original) +++ incubator/samoa/site/documentation/Processing-Item.html Sun Sep 25 20:39:59 2016 @@ -82,33 +82,30 @@ It is used internally, and it is not acc There are two types of Processing Items.</p> <ol> - <li>Simple Processing Item (PI)</li> - <li>Entrance Processing Item (EntrancePI)</li> +<li>Simple Processing Item (PI)</li> +<li>Entrance Processing Item (EntrancePI)</li> </ol> -<h4 id="simple-processing-item-pi">1. Simple Processing Item (PI)</h4> -<p>Once a Processor is wrapped in a PI, it becomes an executable component of the topology. All physical topology units are created with the help of a <code class="highlighter-rouge">TopologyBuilder</code>. Following code snippet shows the creation of a Processing Item.</p> +<h4 id="1-simple-processing-item-pi">1. Simple Processing Item (PI)</h4> -<p><code class="highlighter-rouge"> -builder.initTopology("MyTopology"); +<p>Once a Processor is wrapped in a PI, it becomes an executable component of the topology. All physical topology units are created with the help of a <code>TopologyBuilder</code>. Following code snippet shows the creation of a Processing Item.</p> +<div class="highlight"><pre><code class="language-" data-lang="">builder.initTopology("MyTopology"); Processor samplerProcessor = new Sampler(); ProcessingItem samplerPI = builder.createPI(samplerProcessor,3); -</code> -The <code class="highlighter-rouge">createPI()</code> method of <code class="highlighter-rouge">TopologyBuilder</code> is used to create a PI. Its first argument is the instance of a Processor which needs to be wrapped-in. Its second argument is the parallelism hint. It tells the underlying platforms how many parallel instances of this PI should be created on different nodes.</p> +</code></pre></div> +<p>The <code>createPI()</code> method of <code>TopologyBuilder</code> is used to create a PI. Its first argument is the instance of a Processor which needs to be wrapped-in. Its second argument is the parallelism hint. It tells the underlying platforms how many parallel instances of this PI should be created on different nodes.</p> + +<h4 id="2-entrance-processing-item-entrancepi">2. Entrance Processing Item (EntrancePI)</h4> -<h4 id="entrance-processing-item-entrancepi">2. Entrance Processing Item (EntrancePI)</h4> <p>Entrance Processing Item is different from a PI in only one way: it accepts an Entrance Processor which can generate its own stream. It is mostly used as the source of a topology. It connects to external sources, pulls data and provides it to the topology in the form of streams. -All physical topology units are created with the help of a <code class="highlighter-rouge">TopologyBuilder</code>. +All physical topology units are created with the help of a <code>TopologyBuilder</code>. The following code snippet shows the creation of an Entrance Processing Item.</p> - -<p><code class="highlighter-rouge"> -builder.initTopology("MyTopology"); +<div class="highlight"><pre><code class="language-" data-lang="">builder.initTopology("MyTopology"); EntranceProcessor sourceProcessor = new Source(); EntranceProcessingItem sourcePi = builder.createEntrancePi(sourceProcessor); -</code></p> - +</code></pre></div> </article> <!-- </div> --> Modified: incubator/samoa/site/documentation/Processor.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Processor.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Processor.html (original) +++ incubator/samoa/site/documentation/Processor.html Sun Sep 25 20:39:59 2016 @@ -74,71 +74,79 @@ <article class="post-content"> <p>Processor is the basic logical processing unit. All logic is written in the processor. In SAMOA, a Processor is an interface. Users can implement this interface to build their own processors. -<img src="images/Topology.png" alt="Topology" /> -### Adding a Processor to the topology</p> +<img src="images/Topology.png" alt="Topology"></p> + +<h3 id="adding-a-processor-to-the-topology">Adding a Processor to the topology</h3> <p>There are two ways to add a processor to the topology.</p> -<h4 id="processor">1. Processor</h4> -<p>All physical topology units are created with the help of a <code class="highlighter-rouge">TopologyBuilder</code>. Following code snippet shows how to add a Processor to the topology. -<code class="highlighter-rouge"> +<h4 id="1-processor">1. Processor</h4> + +<p>All physical topology units are created with the help of a <code>TopologyBuilder</code>. Following code snippet shows how to add a Processor to the topology. +<code> Processor processor = new ExampleProcessor(); builder.addProcessor(processor, paralellism); </code> -<code class="highlighter-rouge">addProcessor()</code> method of <code class="highlighter-rouge">TopologyBuilder</code> is used to add the processor. Its first argument is the instance of a Processor which needs to be added. Its second argument is the parallelism hint. It tells the underlying platforms how many parallel instances of this processor should be created on different nodes.</p> +<code>addProcessor()</code> method of <code>TopologyBuilder</code> is used to add the processor. Its first argument is the instance of a Processor which needs to be added. Its second argument is the parallelism hint. It tells the underlying platforms how many parallel instances of this processor should be created on different nodes.</p> + +<h4 id="2-entrance-processor">2. Entrance Processor</h4> -<h4 id="entrance-processor">2. Entrance Processor</h4> <p>Some processors generates their own streams, and they are used as the source of a topology. They connect to external sources, pull data and provide it to the topology in the form of streams. -All physical topology units are created with the help of a <code class="highlighter-rouge">TopologyBuilder</code>. The following code snippet shows how to add an entrance processor to the topology and create a stream from it. -<code class="highlighter-rouge"> +All physical topology units are created with the help of a <code>TopologyBuilder</code>. The following code snippet shows how to add an entrance processor to the topology and create a stream from it. +<code> EntranceProcessor entranceProcessor = new EntranceProcessor(); builder.addEntranceProcessor(entranceProcessor); Stream source = builder.createStream(entranceProcessor); </code></p> <h3 id="preview-of-processor">Preview of Processor</h3> -<p><code class="highlighter-rouge"> -package samoa.core; +<div class="highlight"><pre><code class="language-" data-lang="">package samoa.core; public interface Processor extends java.io.Serializable{ - boolean process(ContentEvent event); - void onCreate(int id); - Processor newProcessor(Processor p); + boolean process(ContentEvent event); + void onCreate(int id); + Processor newProcessor(Processor p); } -</code> -### Methods</p> +</code></pre></div> +<h3 id="methods">Methods</h3> + +<h4 id="1-boolean-process-contentevent-event">1. <code>boolean process(ContentEvent event)</code></h4> + +<p>Users should implement the three methods shown above. <code>process(ContentEvent event)</code> is the method in which all processing logic should be implemented. <code>ContentEvent</code> is a type (interface) which contains the event. This method will be called each time a new event is received. It should return <code>true</code> if the event has been correctly processed, <code>false</code> otherwise.</p> -<h4 id="boolean-processcontentevent-event">1. <code class="highlighter-rouge">boolean process(ContentEvent event)</code></h4> -<p>Users should implement the three methods shown above. <code class="highlighter-rouge">process(ContentEvent event)</code> is the method in which all processing logic should be implemented. <code class="highlighter-rouge">ContentEvent</code> is a type (interface) which contains the event. This method will be called each time a new event is received. It should return <code class="highlighter-rouge">true</code> if the event has been correctly processed, <code class="highlighter-rouge">false</code> otherwise.</p> +<h4 id="2-void-oncreate-int-id">2. <code>void onCreate(int id)</code></h4> -<h4 id="void-oncreateint-id">2. <code class="highlighter-rouge">void onCreate(int id)</code></h4> -<p>is the method in which all initialization code should be written. Multiple copies/instances of the Processor are created based on the parallelism hint specified by the user. SAMOA assigns each instance a unique id which is passed as a parameter <code class="highlighter-rouge">id</code> to <code class="highlighter-rouge">onCreate(int it)</code> method of each instance.</p> +<p>is the method in which all initialization code should be written. Multiple copies/instances of the Processor are created based on the parallelism hint specified by the user. SAMOA assigns each instance a unique id which is passed as a parameter <code>id</code> to <code>onCreate(int it)</code> method of each instance.</p> -<h4 id="processor-newprocessorprocessor-p">3. <code class="highlighter-rouge">Processor newProcessor(Processor p)</code></h4> -<p>is very simple to implement. This method is just a technical overhead that has no logical use except that it helps SAMOA in some of its internals. Users should just return a new copy of the instance of this class which implements this Processor interface.</p> +<h4 id="3-processor-newprocessor-processor-p">3. <code>Processor newProcessor(Processor p)</code></h4> + +<p>is very simple to implement. This method is just a technical overhead that has no logical use except that it helps SAMOA in some of its internals. Users should just return a new copy of the instance of this class which implements this Processor interface. </p> <h3 id="preview-of-entranceprocessor">Preview of EntranceProcessor</h3> -<p>``` -package org.apache.samoa.core;</p> +<div class="highlight"><pre><code class="language-" data-lang="">package org.apache.samoa.core; -<p>public interface EntranceProcessor extends Processor { +public interface EntranceProcessor extends Processor { public boolean isFinished(); public boolean hasNext(); public ContentEvent nextEvent(); } -``` -### Methods</p> +</code></pre></div> +<h3 id="methods">Methods</h3> + +<h4 id="1-boolean-isfinished">1. <code>boolean isFinished()</code></h4> + +<p>returns whether to expect more events coming from the entrance processor. If the source is a live stream this method should return always <code>false</code>. If the source is a file, the method should return <code>true</code> once the file has been fully processed.</p> -<h4 id="boolean-isfinished">1. <code class="highlighter-rouge">boolean isFinished()</code></h4> -<p>returns whether to expect more events coming from the entrance processor. If the source is a live stream this method should return always <code class="highlighter-rouge">false</code>. If the source is a file, the method should return <code class="highlighter-rouge">true</code> once the file has been fully processed.</p> +<h4 id="2-boolean-hasnext">2. <code>boolean hasNext()</code></h4> -<h4 id="boolean-hasnext">2. <code class="highlighter-rouge">boolean hasNext()</code></h4> -<p>returns whether the next event is ready for consumption. If the method returns <code class="highlighter-rouge">true</code> a subsequent call to <code class="highlighter-rouge">nextEvent</code> should yield the next event to be processed. If the method returns <code class="highlighter-rouge">false</code> the engine can use this information to avoid continuously polling the entrance processor.</p> +<p>returns whether the next event is ready for consumption. If the method returns <code>true</code> a subsequent call to <code>nextEvent</code> should yield the next event to be processed. If the method returns <code>false</code> the engine can use this information to avoid continuously polling the entrance processor.</p> -<h4 id="contentevent-nextevent">3. <code class="highlighter-rouge">ContentEvent nextEvent()</code></h4> -<p>is the main method for the entrance processor as it returns the next event to be processed by the topology. It should be called only if <code class="highlighter-rouge">isFinished()</code> returned <code class="highlighter-rouge">false</code> and <code class="highlighter-rouge">hasNext()</code> returned <code class="highlighter-rouge">true</code>.</p> +<h4 id="3-contentevent-nextevent">3. <code>ContentEvent nextEvent()</code></h4> + +<p>is the main method for the entrance processor as it returns the next event to be processed by the topology. It should be called only if <code>isFinished()</code> returned <code>false</code> and <code>hasNext()</code> returned <code>true</code>.</p> <h3 id="note">Note</h3> -<p>All state variables of the class implementing this interface must be serializable. It can be done by implementing the <code class="highlighter-rouge">Serializable</code> interface. The simple way to skip this requirement is to declare those variables as <code class="highlighter-rouge">transient</code> and initialize them in the <code class="highlighter-rouge">onCreate()</code> method. Remember, all initializations of such transient variables done in the constructor will be lost.</p> + +<p>All state variables of the class implementing this interface must be serializable. It can be done by implementing the <code>Serializable</code> interface. The simple way to skip this requirement is to declare those variables as <code>transient</code> and initialize them in the <code>onCreate()</code> method. Remember, all initializations of such transient variables done in the constructor will be lost.</p> </article> Modified: incubator/samoa/site/documentation/SAMOA-for-MOA-users.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/SAMOA-for-MOA-users.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/SAMOA-for-MOA-users.html (original) +++ incubator/samoa/site/documentation/SAMOA-for-MOA-users.html Sun Sep 25 20:39:59 2016 @@ -73,23 +73,23 @@ </header> <article class="post-content"> - <p>If youâre an advanced user of <a href="http://moa.cms.waikato.ac.nz/">MOA</a>, youâll find easy to run SAMOA. You need to note the following:</p> + <p>If you're an advanced user of <a href="http://moa.cms.waikato.ac.nz/">MOA</a>, you'll find easy to run SAMOA. You need to note the following:</p> <ul> - <li>There is no GUI interface in SAMOA</li> - <li>You can run SAMOA in the following modes: - <ol> - <li>Simulation Environment. Use <code class="highlighter-rouge">org.apache.samoa.DoTask</code> instead of <code class="highlighter-rouge">moa.DoTask</code></li> - <li>Storm Local Mode. Use <code class="highlighter-rouge">org.apache.samoa.LocalStormDoTask</code> instead of <code class="highlighter-rouge">moa.DoTask</code></li> - <li>Storm Cluster Mode. You need to use the <code class="highlighter-rouge">samoa</code> script as it is explained in <a href="Executing SAMOA with Apache Storm">Executing SAMOA with Apache Storm</a>.</li> - <li>S4. You need to use the <code class="highlighter-rouge">samoa</code> script as it is explained in <a href="Executing SAMOA with Apache S4">Executing SAMOA with Apache S4</a></li> - </ol> - </li> +<li>There is no GUI interface in SAMOA</li> +<li>You can run SAMOA in the following modes: + +<ol> +<li>Simulation Environment. Use <code>org.apache.samoa.DoTask</code> instead of <code>moa.DoTask</code><br></li> +<li>Storm Local Mode. Use <code>org.apache.samoa.LocalStormDoTask</code> instead of <code>moa.DoTask</code></li> +<li>Storm Cluster Mode. You need to use the <code>samoa</code> script as it is explained in <a href="Executing%20SAMOA%20with%20Apache%20Storm">Executing SAMOA with Apache Storm</a>.</li> +<li>S4. You need to use the <code>samoa</code> script as it is explained in <a href="Executing%20SAMOA%20with%20Apache%20S4">Executing SAMOA with Apache S4</a></li> +</ol></li> </ul> -<p>To start with SAMOA, you can start with a simple example using the CoverType dataset as it is discussed in <a href="Getting Started">Getting Started</a>.</p> +<p>To start with SAMOA, you can start with a simple example using the CoverType dataset as it is discussed in <a href="Getting%20Started">Getting Started</a>. </p> -<p>To use MOA algorithms inside SAMOA, take a look at <a href="https://github.com/samoa-moa/samoa-moa">https://github.com/samoa-moa/samoa-moa</a>.</p> +<p>To use MOA algorithms inside SAMOA, take a look at <a href="https://github.com/samoa-moa/samoa-moa">https://github.com/samoa-moa/samoa-moa</a>. </p> </article> Modified: incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html (original) +++ incubator/samoa/site/documentation/Scalable-Advanced-Massive-Online-Analysis.html Sun Sep 25 20:39:59 2016 @@ -83,6 +83,7 @@ <li><a href="Executing-SAMOA-with-Apache-S4.html">Executing SAMOA with Apache S4</a></li> <li><a href="Executing-SAMOA-with-Apache-Samza.html">Executing SAMOA with Apache Samza</a></li> <li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">Executing SAMOA with Apache Avro Files</a></li> +<li><a href="Executing-SAMOA-with-Apache-Avro-Files.html">Executing SAMOA with Apache Avro Files</a></li> </ul> </article> Modified: incubator/samoa/site/documentation/Stream.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Stream.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Stream.html (original) +++ incubator/samoa/site/documentation/Stream.html Sun Sep 25 20:39:59 2016 @@ -73,51 +73,47 @@ </header> <article class="post-content"> - <p>A stream is a physical unit of SAMOA topology which connects different Processors with each other. Stream is also created by a <code class="highlighter-rouge">TopologyBuilder</code> just like a Processor. A stream can have a single source but many destinations. A Processor which is the source of a stream, owns the stream.</p> + <p>A stream is a physical unit of SAMOA topology which connects different Processors with each other. Stream is also created by a <code>TopologyBuilder</code> just like a Processor. A stream can have a single source but many destinations. A Processor which is the source of a stream, owns the stream.</p> -<h3 id="creating-a-stream">1. Creating a Stream</h3> -<p>The following code snippet shows how a Stream is created:</p> +<h3 id="1-creating-a-stream">1. Creating a Stream</h3> -<p><code class="highlighter-rouge"> -builder.initTopology("MyTopology"); +<p>The following code snippet shows how a Stream is created:</p> +<div class="highlight"><pre><code class="language-" data-lang="">builder.initTopology("MyTopology"); Processor sourceProcessor = new Sampler(); builder.addProcessor(samplerProcessor, 3); Stream sourceDataStream = builder.createStream(sourceProcessor); -</code></p> +</code></pre></div> +<h3 id="2-connecting-a-stream">2. Connecting a Stream</h3> -<h3 id="connecting-a-stream">2. Connecting a Stream</h3> <p>As described above, a Stream can have many destinations. In the following figure, a single stream from sourceProcessor is connected to three different destination Processors each having three instances.</p> -<p><img src="images/SAMOA Message Shuffling.png" alt="SAMOA Message Shuffling" /></p> +<p><img src="images/SAMOA%20Message%20Shuffling.png" alt="SAMOA Message Shuffling"></p> + +<p>SAMOA supports three different ways of distribution of messages to multiple instances of a Processor.</p> -<p>SAMOA supports three different ways of distribution of messages to multiple instances of a Processor. -####2.1 Shuffle -In this way of message distribution, messages/events are distributed randomly among various instances of a Processor. +<h4 id="2-1-shuffle">2.1 Shuffle</h4> + +<p>In this way of message distribution, messages/events are distributed randomly among various instances of a Processor. Following figure shows how the messages are distributed. -<img src="images/SAMOA Explain Shuffling.png" alt="SAMOA Explain Shuffling" /> +<img src="images/SAMOA%20Explain%20Shuffling.png" alt="SAMOA Explain Shuffling"> Following code snipped shows how to connect a stream to a destination using random shuffling.</p> +<div class="highlight"><pre><code class="language-" data-lang="">builder.connectInputShuffleStream(sourceDataStream, destinationProcessor); +</code></pre></div> +<h4 id="2-2-key">2.2 Key</h4> -<p><code class="highlighter-rouge"> -builder.connectInputShuffleStream(sourceDataStream, destinationProcessor); -</code> -####2.2 Key -In this way of message distribution, messages with same key are sent to same instance of a Processor. +<p>In this way of message distribution, messages with same key are sent to same instance of a Processor. Following figure illustrates key-based distribution. -<img src="images/SAMOA Explain Key Shuffling.png" alt="SAMOA Explain Key Shuffling" /> +<img src="images/SAMOA%20Explain%20Key%20Shuffling.png" alt="SAMOA Explain Key Shuffling"> Following code snippet shows how to connect a stream to a destination using key-based distribution.</p> +<div class="highlight"><pre><code class="language-" data-lang="">builder.connectInputKeyStream(sourceDataStream, destinationProcessor); +</code></pre></div> +<h4 id="2-3-all">2.3 All</h4> -<p><code class="highlighter-rouge"> -builder.connectInputKeyStream(sourceDataStream, destinationProcessor); -</code> -####2.3 All -In this way of message distribution, all messages of a stream are sent to all instances of a destination Processor. Following figure illustrates this distribution process. -<img src="images/SAMOA Explain All Shuffling.png" alt="SAMOA Explain All Shuffling" /> +<p>In this way of message distribution, all messages of a stream are sent to all instances of a destination Processor. Following figure illustrates this distribution process. +<img src="images/SAMOA%20Explain%20All%20Shuffling.png" alt="SAMOA Explain All Shuffling"> Following code snippet shows how to connect a stream to a destination using All-based distribution.</p> - -<p><code class="highlighter-rouge"> -builder.connectInputAllStream(sourceDataStream, destinationProcessor); -</code></p> - +<div class="highlight"><pre><code class="language-" data-lang="">builder.connectInputAllStream(sourceDataStream, destinationProcessor); +</code></pre></div> </article> <!-- </div> --> Modified: incubator/samoa/site/documentation/Task.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Task.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Task.html (original) +++ incubator/samoa/site/documentation/Task.html Sun Sep 25 20:39:59 2016 @@ -73,55 +73,56 @@ </header> <article class="post-content"> - <p>Task is similar to a job in Hadoop. Task is an execution entity. A topology must be defined inside a task. SAMOA can only execute classes that implement <code class="highlighter-rouge">Task</code> interface.</p> + <p>Task is similar to a job in Hadoop. Task is an execution entity. A topology must be defined inside a task. SAMOA can only execute classes that implement <code>Task</code> interface.</p> -<h3 id="implementation">1. Implementation</h3> -<p>``` -package org.apache.samoa.tasks;</p> +<h3 id="1-implementation">1. Implementation</h3> +<div class="highlight"><pre><code class="language-" data-lang="">package org.apache.samoa.tasks; -<p>import org.apache.samoa.topology.ComponentFactory; -import org.apache.samoa.topology.Topology;</p> +import org.apache.samoa.topology.ComponentFactory; +import org.apache.samoa.topology.Topology; -<p>/** +/** * Task interface, the mother of all SAMOA tasks! */ -public interface Task {</p> +public interface Task { -<div class="highlighter-rouge"><pre class="highlight"><code>/** - * Initialize this SAMOA task, - * i.e. create and connect Processors and Streams - * and initialize the topology - */ -public void init(); + /** + * Initialize this SAMOA task, + * i.e. create and connect Processors and Streams + * and initialize the topology + */ + public void init(); + + /** + * Return the final topology object to be executed in the cluster + * @return topology object to be submitted to be executed in the cluster + */ + public Topology getTopology(); + + /** + * Sets the factory. + * TODO: propose to hide factory from task, + * i.e. Task will only see TopologyBuilder, + * and factory creation will be handled by TopologyBuilder + * + * @param factory the new factory + */ + public void setFactory(ComponentFactory factory) ; +} +</code></pre></div> +<h3 id="2-methods">2. Methods</h3> + +<h5 id="2-1-void-init">2.1 <code>void init()</code></h5> + +<p>This method should build the desired topology by creating Processors and Streams and connecting them to each other.</p> -/** - * Return the final topology object to be executed in the cluster - * @return topology object to be submitted to be executed in the cluster - */ -public Topology getTopology(); - -/** - * Sets the factory. - * TODO: propose to hide factory from task, - * i.e. Task will only see TopologyBuilder, - * and factory creation will be handled by TopologyBuilder - * - * @param factory the new factory - */ -public void setFactory(ComponentFactory factory) ; } ``` -</code></pre> -</div> - -<h3 id="methods">2. Methods</h3> -<p>#####2.1 <code class="highlighter-rouge">void init()</code> -This method should build the desired topology by creating Processors and Streams and connecting them to each other.</p> +<h5 id="2-2-topology-gettopology">2.2 <code>Topology getTopology()</code></h5> -<h5 id="topology-gettopology">2.2 <code class="highlighter-rouge">Topology getTopology()</code></h5> -<p>This method should return the topology built by <code class="highlighter-rouge">init</code> to the engine for execution.</p> +<p>This method should return the topology built by <code>init</code> to the engine for execution.</p> -<h5 id="void-setfactorycomponentfactory-factory">2.3 <code class="highlighter-rouge">void setFactory(ComponentFactory factory)</code></h5> -<p>Utility method to accept a <code class="highlighter-rouge">ComponentFactory</code> to use in building the topology.</p> +<h5 id="2-3-void-setfactory-componentfactory-factory">2.3 <code>void setFactory(ComponentFactory factory)</code></h5> +<p>Utility method to accept a <code>ComponentFactory</code> to use in building the topology.</p> </article> Modified: incubator/samoa/site/documentation/Team.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Team.html?rev=1762231&r1=1762230&r2=1762231&view=diff ============================================================================== --- incubator/samoa/site/documentation/Team.html (original) +++ incubator/samoa/site/documentation/Team.html Sun Sep 25 20:39:59 2016 @@ -76,51 +76,52 @@ <h2 id="team">Team</h2> <table class="table table-striped"> - <thead> - <th class="text-center"></th> - <th class="text-center">Name</th> - <th class="text-center">Role</th> - <th class="text-center">Apache ID</th> - </thead> - <tr> - <td class="text-center"></td> - <td class="text-center"><a href="http://gdfm.me/">Gianmarco De Francisci Morales</a></td> - <td class="text-center">PPMC</td> - <td class="text-center">gdfm</td> - </tr> - <tr> - <td class="text-center"></td> - <td class="text-center"><a href="http://www.albertbifet.com">Albert Bifet</a></td> - <td class="text-center">PPMC</td> - <td class="text-center">abifet</td> - </tr> - <tr> - <td class="text-center"></td> - <td class="text-center">Nicolas Kourtellis</td> - <td class="text-center">PPMC</td> - <td class="text-center">nkourtellis</td> - </tr> - <tr> - <td class="text-center"></td> - <td class="text-center"><a href="http://www.otnira.com">Arinto Murdopo</a></td> - <td class="text-center">PPMC</td> - <td class="text-center">arinto</td> - </tr> - <tr> - <td class="text-center"></td> - <td class="text-center">Matthieu Morel</td> - <td class="text-center">PPMC</td> - <td class="text-center">mmorel</td> - </tr> - <tr> - <td class="text-center"></td> - <td class="text-center"><a href="http://www.van-laere.net">Olivier Van Laere</a></td> - <td class="text-center">PPMC</td> - <td class="text-center">ovlaere</td> - </tr> + <thead> + <th class="text-center"></th> + <th class="text-center">Name</th> + <th class="text-center">Role</th> + <th class="text-center">Apache ID</th> + </thead> + <tr> + <td class="text-center"></td> + <td class="text-center"><a href="http://gdfm.me/">Gianmarco De Francisci Morales</a></td> + <td class="text-center">PPMC</td> + <td class="text-center">gdfm</td> + </tr> + <tr> + <td class="text-center"></td> + <td class="text-center"><a href="http://www.albertbifet.com">Albert Bifet</a></td> + <td class="text-center">PPMC</td> + <td class="text-center">abifet</td> + </tr> + <tr> + <td class="text-center"></td> + <td class="text-center">Nicolas Kourtellis</td> + <td class="text-center">PPMC</td> + <td class="text-center">nkourtellis</td> + </tr> + <tr> + <td class="text-center"></td> + <td class="text-center"><a href="http://www.otnira.com">Arinto Murdopo</a></td> + <td class="text-center">PPMC</td> + <td class="text-center">arinto</td> + </tr> + <tr> + <td class="text-center"></td> + <td class="text-center">Matthieu Morel</td> + <td class="text-center">PPMC</td> + <td class="text-center">mmorel</td> + </tr> + <tr> + <td class="text-center"></td> + <td class="text-center"><a href="http://www.van-laere.net">Olivier Van Laere</a></td> + <td class="text-center">PPMC</td> + <td class="text-center">ovlaere</td> + </tr> </table> <h3 id="contributors">Contributors</h3> + <ul> <li><a href="http://www.lsi.upc.edu/~marias/">Marta Arias</a></li> <li>Foteini Beligianni</li>
