Repository: incubator-samoa Updated Branches: refs/heads/gh-pages 09937b790 -> 7acb1c475
http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Executing-SAMOA-with-Apache-Samza.md ---------------------------------------------------------------------- diff --git a/documentation/Executing-SAMOA-with-Apache-Samza.md b/documentation/Executing-SAMOA-with-Apache-Samza.md new file mode 100644 index 0000000..c0f45a9 --- /dev/null +++ b/documentation/Executing-SAMOA-with-Apache-Samza.md @@ -0,0 +1,290 @@ +--- +title: Executing Apache SAMOA with Apache Samza +layout: documentation +documentation: true +--- +This tutorial describes how to run SAMOA on Apache Samza. +The steps included in this tutorial are: + +1. Setup and configure a cluster with the required dependencies. This applies for single-node (local) execution as well. + +2. Build SAMOA deployables + +3. Configure SAMOA-Samza + +4. Deploy SAMOA-Samza and execute a task + +5. Observe the execution and the result + +## Setup cluster +The following are needed to to run SAMOA on top of Samza: + +* [Apache Zookeeper](http://zookeeper.apache.org/) +* [Apache Kafka](http://kafka.apache.org/) +* [Apache Hadoop YARN and HDFS](http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html) + +### Zookeeper +Zookeeper is used by Kafka to coordinate its brokers. The detail instructions to setup a Zookeeper cluster can be found [here](http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html). + +To quickly setup a single-node Zookeeper cluster: + +1. Download the binary release from the [release page](http://zookeeper.apache.org/releases.html). + +2. Untar the archive + +``` +tar -xf $DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/ +``` + +3. Copy the default configuration file + +``` +cp zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg +``` + +4. Start the single-node cluster + +``` +~/zookeeper-3.4.6/bin/zkServer.sh start +``` + +### Kafka +Kafka is a distributed, partitioned, replicated commit log service which Samza uses as its default messaging system. + +1. Download a binary release of Kafka [here](http://kafka.apache.org/downloads.html). As mentioned in the page, the Scala version does not matter. However, 2.10 is recommended as Samza has recently been moved to Scala 2.10. + +2. Untar the archive + +``` +tar -xzf $DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/ +``` + +If you are running in local mode or a single-node cluster, you can now start Kafka with the command: + +``` +~/kafka_2.10-0.8.1/bin/kafka-server-start.sh kafka_2.10-0.8.1/config/server.properties +``` + +In multi-node cluster, it is typical and convenient to have a Kafka broker on each node (although you can totally have a smaller Kafka cluster, or even a single-node Kafka cluster). The number of brokers in Kafka cluster will affect disk bandwidth and space (the more brokers we have, the higher value we will get for the two). In each node, you need to set the following properties in `~/kafka_2.10-0.8.1/config/server.properties` before starting Kafka service. + +``` +broker.id=a-unique-number-for-each-node +zookeeper.connect=zookeeper-host0-url:2181[,zookeeper-host1-url:2181,...] +``` + +You might want to change the retention hours or retention bytes of the logs to avoid the logs size from growing too big. + +``` +log.retention.hours=number-of-hours-to-keep-the-logs +log.retention.bytes=number-of-bytes-to-keep-in-the-logs +``` + +### Hadoop YARN and HDFS +> Hadoop YARN and HDFS are **not** required to run SAMOA in Samza local mode. + +To set up a YARN cluster, first download a binary release of Hadoop [here](http://www.apache.org/dyn/closer.cgi/hadoop/common/) on each node in the cluster and untar the archive +`tar -xf $DOWNLOAD_DIR/hadoop-2.2.0.tar.gz -C ~/`. We have tested SAMOA with Hadoop 2.2.0 but Hadoop 2.3.0 should work too. + +**HDFS** + +Set the following properties in `~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml` in all nodes. + +``` +<configuration> + <property> + <name>dfs.datanode.data.dir</name> + <value>file:///home/username/hadoop-2.2.0/hdfs/datanode</value> + <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description> + </property> + + <property> + <name>dfs.namenode.name.dir</name> + <value>file:///home/username/hadoop-2.2.0/hdfs/namenode</value> + <description>Path on the local filesystem where the NameNode stores the namespace and transaction logs persistently.</description> + </property> +</configuration> +``` + +Add this property in `~/hadoop-2.2.0/etc/hadoop/core-site.xml` in all nodes. + +``` +<configuration> + <property> + <name>fs.defaultFS</name> + <value>hdfs://localhost:9000/</value> + <description>NameNode URI</description> + </property> + + <property> + <name>fs.hdfs.impl</name> + <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> + </property> +</configuration> +``` +For a multi-node cluster, change the hostname ("localhost") to the correct host name of your namenode server. + +Format HDFS directory (only perform this if you are running it for the very first time) + +``` +~/hadoop-2.2.0/bin/hdfs namenode -format +``` + +Start namenode daemon on one of the node + +``` +~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode +``` + +Start datanode daemon on all nodes + +``` +~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode +``` + +**YARN** + +If you are running in multi-node cluster, set the resource manager hostname in `~/hadoop-2.2.0/etc/hadoop/yarn-site.xml` in all nodes as follow: + +``` +<configuration> + <property> + <name>yarn.resourcemanager.hostname</name> + <value>resourcemanager-url</value> + <description>The hostname of the RM.</description> + </property> +</configuration> +``` + +**Other configurations** +Now we need to tell Samza where to find the configuration of YARN cluster. To do this, first create a new directory in all nodes: + +``` +mkdir ~/.samza +mkdir ~/.samza/conf +``` + +Copy (or soft link) `core-site.xml`, `hdfs-site.xml`, `yarn-site.xml` in `~/hadoop-2.2.0/etc/hadoop` to the new directory + +``` +ln -s ~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml +ln -s ~/.samza/conf/hdfs-site.xml ~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml +ln -s ~/.samza/conf/yarn-site.xml ~/hadoop-2.2.0/etc/hadoop/yarn-site.xml +``` + +Export the enviroment variable YARN_HOME (in ~/.bashrc) so Samza knows where to find these YARN configuration files. + +``` +export YARN_HOME=$HOME/.samza +``` + +**Start the YARN cluster** +Start resource manager on master node + +``` +~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager +``` + +Start node manager on all worker nodes + +``` +~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager +``` + +## Build SAMOA +Perform the following step on one of the node in the cluster. Here we assume git and maven are installed on this node. + +Since Samza is not yet released on Maven, we will have to clone Samza project, build and publish to Maven local repository: + +``` +git clone -b 0.7.0 https://github.com/apache/incubator-samza.git +cd incubator-samza +./gradlew clean build +./gradlew publishToMavenLocal +``` + +Here we cloned and installed Samza version 0.7.0, the current released version (July 2014). + +Now we can clone the repository and install SAMOA. + +``` +git clone http://git.apache.org/incubator-samoa.git +cd incubator-samoa +mvn -Psamza package +``` + +The deployable jars for SAMOA will be in `target/SAMOA-<variant>-<version>-SNAPSHOT.jar`. For example, in our case for Samza `target/SAMOA-Samza-0.2.0-SNAPSHOT.jar`. + +## Configure SAMOA-Samza execution +This section explains the configuration parameters in `bin/samoa-samza.properties` that are required to run SAMOA on top of Samza. + +**Samza execution mode** + +``` +samoa.samza.mode=[yarn|local] +``` +This parameter specify which mode to execute the task: `local` for local execution and `yarn` for cluster execution. + +**Zookeeper** + +``` +zookeeper.connect=localhost +zookeeper.port=2181 +``` +The default setting above applies for local mode execution. For cluster mode, change `zookeeper.host` to the correct URL of your zookeeper host. + +**Kafka** + +``` +kafka.broker.list=localhost:9092 +``` +`kafka.broker.list` is a comma separated list of host:port of all the brokers in Kafka cluster. + +``` +kafka.replication.factor=1 +``` +`kafka.replication.factor` specifies the number of replicas for each stream in Kafka. This number must be less than or equal to the number of brokers in Kafka cluster. + +**YARN** +> The below settings do not apply for local mode execution, you can leave them as they are. + +`yarn.am.memory` and `yarn.container.memory` specify the memory requirement for the Application Master container and the worker containers, respectively. + +``` +yarn.am.memory=1024 +yarn.container.memory=1024 +``` + +`yarn.package.path` specifies the path (typically a HDFS path) of the package to be distributed to all YARN containers to execute the task. + +``` +yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar +``` + +**Samza** +`max.pi.per.container` specifies the number of PI instances allowed in one YARN container. + +``` +max.pi.per.container=1 +``` + +`kryo.register.file` specifies the registration file for Kryo serializer. + +``` +kryo.register.file=samza-kryo +``` + +`checkpoint.commit.ms` specifies the frequency for PIs to commit their checkpoints (in ms). The default value is 1 minute. + +``` +checkpoint.commit.ms=60000 +``` + +## Deploy SAMOA-Samza task +Execute SAMOA task with the following command: + +``` +bin/samoa samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar "<task> & <options>" +``` + +## Observe execution and result +In local mode, all the log will be printed out to stdout. If you execute the task on YARN cluster, the output is written to stdout files in YARN's containers' log folder ($HADOOP_HOME/logs/userlogs/application_\<application-id\>/container_\<container-id\>). http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Executing-SAMOA-with-Apache-Storm.md ---------------------------------------------------------------------- diff --git a/documentation/Executing-SAMOA-with-Apache-Storm.md b/documentation/Executing-SAMOA-with-Apache-Storm.md new file mode 100644 index 0000000..0fcdea2 --- /dev/null +++ b/documentation/Executing-SAMOA-with-Apache-Storm.md @@ -0,0 +1,100 @@ +--- +title: Executing Apache SAMOA with Apache Storm +layout: documentation +documentation: true +--- +In this tutorial page we describe how to execute SAMOA on top of Apache Storm. Here is an outline of what we want to do: + +1. Ensure that you have necessary Storm cluster and configuration to execute SAMOA +2. Ensure that you have all the SAMOA deployables for execution in the cluster +3. Configure samoa-storm.properties +4. Execute SAMOA classification task +5. Observe the task execution + +### Storm Configuration +Before we start the tutorial, please ensure that you already have Storm cluster (preferably Storm 0.8.2) running. You can follow this [tutorial](http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/) to set up a Storm cluster. + +You also need to install Storm at the machine where you initiate the deployment, and configure Storm (at least) with this configuration in `~/.storm/storm.yaml`: + +``` +########### These MUST be filled in for a storm configuration +nimbus.host: "<enter your nimbus host name here>" + +## List of custom serializations +kryo.register: + - com.yahoo.labs.samoa.learners.classifiers.trees.AttributeContentEvent: com.yahoo.labs.samoa.learners.classifiers.trees.AttributeContentEvent$AttributeCEFullPrecSerializer + - com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent: com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer +``` +<!-- +Or, if you are using SAMOA with optimized VHT, you should use this following configuration file: +``` +########### These MUST be filled in for a storm configuration +nimbus.host: "<enter your nimbus host name here>" + +## List of custom serializations +kryo.register: + - com.yahoo.labs.samoa.learners.classifiers.trees.NaiveAttributeContentEvent: com.yahoo.labs.samoa.classifiers.trees.NaiveAttributeContentEvent$NaiveAttributeCEFullPrecSerializer + - com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent: com.yahoo.labs.samoa.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer +``` +--> + +Alternatively, if you don't have Storm cluster running, you can execute SAMOA with Storm in local mode as explained in section [samoa-storm.properties Configuration](#samoa-storm-properties). + +### SAMOA deployables +There are three deployables for executing SAMOA on top of Storm. They are: + +1. `bin/samoa` is the main script to execute SAMOA. You do not need to change anything in this script. +2. `target/SAMOA-Storm-x.x.x-SNAPSHOT.jar` is the deployed jar file. `x.x.x` is the version number of SAMOA. +3. `bin/samoa-storm.properties` contains deployment configurations. You need to set the parameters in this properties file correctly. + +### <a name="samoa-storm-properties"> samoa-storm.properties Configuration</a> +Currently, the properties file contains two configurations: + +1. `samoa.storm.mode` determines whether the task is executed locally (using Storm's `LocalCluster`) or executed in a Storm cluster. Use `local` if you want to test SAMOA and you do not have a Storm cluster for deployment. Use `cluster` if you want to test SAMOA on your Storm cluster. +2. `samoa.storm.numworker` determines the number of worker to execute the SAMOA tasks in the Storm cluster. This field must be an integer, less than or equal to the number of available slots in you Storm cluster. If you are using local mode, this property corresponds to the number of thread used by Storm's LocalCluster to execute your SAMOA task. + +Here is the example of a complete properties file: + +``` +# SAMOA Storm properties file +# This file contains specific configurations for SAMOA deployment in the Storm platform +# Note that you still need to configure Storm client in your machine, +# including setting up Storm configuration file (~/.storm/storm.yaml) with correct settings + +# samoa.storm.mode corresponds to the execution mode of the Task in Storm +# possible values: +# 1. cluster: the Task will be sent into nimbus. The nimbus is configured by Storm configuration file +# 2. local: the Task will be sent using local Storm cluster +samoa.storm.mode=cluster + +# samoa.storm.numworker corresponds to the number of worker processes allocated in Storm cluster +# possible values: any integer greater than 0 +samoa.storm.numworker=7 +``` + +### SAMOA task execution + +You can execute a SAMOA task using the aforementioned `bin/samoa` script with this following format: +`bin/samoa <platform> <jar> "<task>"`. + +`<platform>` can be `storm` or `s4`. Using `storm` option means you are deploying SAMOA on a Storm environment. In this configuration, the script uses the aforementioned yaml file (`~/.storm/storm.yaml`) and `samoa-storm.properties` to perform the deployment. Using `s4` option means you are deploying SAMOA on an Apache S4 environment. Follow this [link](Executing-SAMOA-with-Apache-S4) to learn more about deploying SAMOA on Apache S4. + +`<jar>` is the location of the deployed jar file (`SAMOA-Storm-x.x.x-SNAPSHOT.jar`) in your file system. The location can be a relative path or an absolute path into the jar file. + +`"<task>"` is the SAMOA task command line such as `PrequentialEvaluation` or `ClusteringTask`. This command line for SAMOA task follows the format of [Massive Online Analysis (MOA)](http://moa.cms.waikato.ac.nz/details/classification/command-line/). + +The complete command to execute SAMOA is: + +``` +bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s (com.yahoo.labs.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u 10)" +``` +The example above uses [Prequential Evaluation task](Prequential-Evaluation-Task) and [Vertical Hoeffding Tree](Vertical-Hoeffding-Tree-Classifier) classifier. + +### Observing task execution +There are two ways to observe the task execution using Storm UI and by monitoring the dump file of the SAMOA task. Notice that the dump file will be created on the cluster if you are executing your task in `cluster` mode. + +#### Using Storm UI +Go to the web address of Storm UI and check whether the SAMOA task executes as intended. Use this UI to kill the associated Storm topology if necessary. + +#### Monitoring the dump file +Several tasks have options to specify a dump file, which is a file that represents the task output. In our example, [Prequential Evaluation task](Prequential-Evaluation-Task) has `-d` option which specifies the path to the dump file. Since Storm performs the allocation of Storm tasks, you should set the dump file into a file on a shared filesystem if you want to access it from the machine submitting the task. http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Getting-Started.md ---------------------------------------------------------------------- diff --git a/documentation/Getting-Started.md b/documentation/Getting-Started.md new file mode 100644 index 0000000..99e80b3 --- /dev/null +++ b/documentation/Getting-Started.md @@ -0,0 +1,32 @@ +--- +title: Getting Started +layout: documentation +documentation: true +--- +We start showing how simple is to run a first large scale machine learning task in SAMOA. We will evaluate a bagging ensemble method using decision trees on the Forest Covertype dataset. + +* 1. Download SAMOA + +```bash +git clone http://git.apache.org/incubator-samoa.git +cd incubator-samoa +mvn package #Local mode +``` +* 2. Download the Forest CoverType dataset + +```bash +wget "http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip" +unzip covtypeNorm.arff.zip +``` + +_Forest Covertype_ contains the forest cover type for 30 x 30 meter cells obtained from the US Forest Service (USFS) Region 2 Resource Information System (RIS) data. It contains 581,012 instances and 54 attributes, and it has been used in several articles on data stream classification. + +* 3. Run an example: classifying the CoverType dataset with the bagging algorithm + +```bash +bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar "PrequentialEvaluation -l classifiers.ensemble.Bagging + -s (ArffFileStream -f covtypeNorm.arff) -f 100000" +``` + + +The output will be a list of the evaluation results, plotted each 100,000 instances. http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Home.md ---------------------------------------------------------------------- diff --git a/documentation/Home.md b/documentation/Home.md new file mode 100644 index 0000000..ebc3475 --- /dev/null +++ b/documentation/Home.md @@ -0,0 +1,57 @@ +--- +title: Apache SAMOA Documentation +layout: documentation +documentation: true +--- +Apache SAMOA is a distributed realtime machine learning system, similar to Mahout, but specific designed for stream mining. Apache SAMOA is simple and fun to use! + +This documentation is intended to give an introduction on how to use Apache SAMOA in different ways. As a user you can run Apache SAMOA algorithms into several Stream Processing Engines: local mode, Apache Storm, S4 and Samza. As a developer you can create new algorithms only once and test them in all of these Stream Processing Engines. + +## Getting Started + +* [0 Hands-on with SAMOA: Getting Started!](Getting-Started.html) + + +## Users + +* [1 Building and Executing SAMOA](Scalable-Advanced-Massive-Online-Analysis.html) + * [1.0 Building SAMOA](Building-SAMOA.html) + * [1.1 Executing SAMOA with Apache Storm](Executing-SAMOA-with-Apache-Storm.html) + * [1.2 Executing SAMOA with Apache S4](Executing-SAMOA-with-Apache-S4.html) + * [1.3 Executing SAMOA with Apache Samza](Executing-SAMOA-with-Apache-Samza.html) +* [2 Machine Learning Methods in SAMOA](SAMOA-and-Machine-Learning.html) + * [2.1 Prequential Evaluation Task](Prequential-Evaluation-Task.html) + * [2.2 Vertical Hoeffding Tree Classifier](Vertical-Hoeffding-Tree-Classifier.html) + * [2.3 Adaptive Model Rules Regressor](Adaptive-Model-Rules-Regressor.html) + * [2.4 Bagging and Boosting](Bagging-and-Boosting.html) + * [2.5 Distributed Stream Clustering](Distributed-Stream-Clustering.html) + * [2.6 Distributed Stream Frequent Itemset Mining](Distributed-Stream-Frequent-Itemset-Mining.html) + * [2.7 SAMOA for MOA users](SAMOA-for-MOA-users.html) + +## Developers + +* [3 Understanding SAMOA Topologies](SAMOA-Topology.html) + * [3.1 Processor](Processor.html) + * [3.2 Content Event](Content-Event.html) + * [3.3 Stream](Stream.html) + * [3.4 Task](Task.html) + * [3.5 Topology Builder](Topology-Builder.html) + * [3.6 Learner](Learner.html) + * [3.7 Processing Item](Processing-Item.html) +* [4 Developing New Tasks in SAMOA](Developing-New-Tasks-in-SAMOA.html) + +### Getting help + +#### Apache SAMOA Users +Samoa users should send messages and subscribe to [[email protected]](mailto:[email protected]). + +You can subscribe to this list by sending an email to [[email protected]](mailto:[email protected]). Likewise, you can cancel a subscription by sending an email to [[email protected]](mailto:[email protected]). + + +#### Apache SAMOA Developers +Storm developers should send messages and subscribe to [[email protected]](mailto:[email protected]). + +You can subscribe to this list by sending an email to [[email protected]](mailto:[email protected]). Likewise, you can cancel a subscription by sending an email to [[email protected]](mailto:[email protected]). + +__NOTE:__ The google groups account [email protected] is now officially deprecated in favor of the Apache-hosted user/dev mailing lists. + http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Learner.md ---------------------------------------------------------------------- diff --git a/documentation/Learner.md b/documentation/Learner.md new file mode 100644 index 0000000..f73c47a --- /dev/null +++ b/documentation/Learner.md @@ -0,0 +1,20 @@ +--- +title: Learner +layout: documentation +documentation: true +--- +Learners are implemented in SAMOA as sub-topologies. + +``` +public interface Learner extends Serializable{ + + public void init(TopologyBuilder topologyBuilder, Instances dataset); + + public Processor getInputProcessor(); + + public Stream getResultStream(); +} +``` +When a `Task` object is initiated via `init()`, the method `init(...)` of `Learner` is called, and the topology is added to the global topology of the task. + +To create a new learner, it is only needed to add streams, processors and their connections to the topology in `init(...)`, specify what is the processor that will manage the input stream of the learner in `getInputProcessor()`, and finally, specify what is going to be the output stream of the learner with `getResultStream()`. http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Prequential-Evaluation-Task.md ---------------------------------------------------------------------- diff --git a/documentation/Prequential-Evaluation-Task.md b/documentation/Prequential-Evaluation-Task.md new file mode 100644 index 0000000..3322218 --- /dev/null +++ b/documentation/Prequential-Evaluation-Task.md @@ -0,0 +1,27 @@ +--- +title: Prequential Evaluation +layout: documentation +documentation: true +--- +In data stream mining, the most used evaluation scheme is the prequential or interleaved-test-then-train evolution. The idea is very simple: we use each instance first to test the model, and then to train the model. The Prequential Evaluation task evaluates the performance of online classifiers doing this. It supports two classification performance evaluators: the basic one which measures the accuracy of the classifier model since the start of the evaluation, and a window based one which measures the accuracy on the current sliding window of recent instances. + +Examples of Prequential Evaluation task in SAMOA command line when deploying into Storm + + +``` +bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l (classifiers.trees.VerticalHoeffdingTree -p 4) -s (generators.RandomTreeGenerator -c 2 -o 10 -u 10)" +``` + +Parameters: + +* `-l`: classifier to train +* `-s`: stream to learn from +* `-e`: classification performance evaluation method +* `-i`: maximum number of instances to test/train on (-1 = no limit) +* `-f`: number of instances between samples of the learning performance +* `-n`: evaluation name (default: PrequentialEvaluation_TimeStamp) +* `-d`: file to append intermediate csv results to + +In terms of SAMOA API, the Prequential Evaluation Task consists of a source `Entrance Processor`, a `Classifier`, and an `Evaluator Processor` as shown below. The `Entrance Processor` sends instances to the `Classifier` using the `source` stream. The classifier sends the classification results to the `Evaluator Processor` via the `result` stream. The `Entrance Processor` corresponds to the `-s` option of Prequential Evaluation, the `Classifier` corresponds to the `-l` option, and the `Evaluator Processor` corresponds to the `-e` option. + + http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Processing-Item.md ---------------------------------------------------------------------- diff --git a/documentation/Processing-Item.md b/documentation/Processing-Item.md new file mode 100644 index 0000000..d118ab5 --- /dev/null +++ b/documentation/Processing-Item.md @@ -0,0 +1,38 @@ +--- +title: Processing Item +layout: documentation +documentation: true +--- +Processing Item is a hidden physical unit of the topology and is just a wrapper of Processor. +It is used internally, and it is not accessible from the API. + +### Advanced + +It does not contain any logic but connects the Processor to the other processors in the topology. +There are two types of Processing Items. + +1. Simple Processing Item (PI) +2. Entrance Processing Item (EntrancePI) + +#### 1. Simple Processing Item (PI) +Once a Processor is wrapped in a PI, it becomes an executable component of the topology. All physical topology units are created with the help of a `TopologyBuilder`. Following code snippet shows the creation of a Processing Item. + +``` +builder.initTopology("MyTopology"); +Processor samplerProcessor = new Sampler(); +ProcessingItem samplerPI = builder.createPI(samplerProcessor,3); +``` +The `createPI()` method of `TopologyBuilder` is used to create a PI. Its first argument is the instance of a Processor which needs to be wrapped-in. Its second argument is the parallelism hint. It tells the underlying platforms how many parallel instances of this PI should be created on different nodes. + +#### 2. Entrance Processing Item (EntrancePI) +Entrance Processing Item is different from a PI in only one way: it accepts an Entrance Processor which can generate its own stream. +It is mostly used as the source of a topology. +It connects to external sources, pulls data and provides it to the topology in the form of streams. +All physical topology units are created with the help of a `TopologyBuilder`. +The following code snippet shows the creation of an Entrance Processing Item. + +``` +builder.initTopology("MyTopology"); +EntranceProcessor sourceProcessor = new Source(); +EntranceProcessingItem sourcePi = builder.createEntrancePi(sourceProcessor); +``` http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Processor.md ---------------------------------------------------------------------- diff --git a/documentation/Processor.md b/documentation/Processor.md new file mode 100644 index 0000000..8891cd7 --- /dev/null +++ b/documentation/Processor.md @@ -0,0 +1,71 @@ +--- +title: Processor +layout: documentation +documentation: true +--- +Processor is the basic logical processing unit. All logic is written in the processor. In SAMOA, a Processor is an interface. Users can implement this interface to build their own processors. + +### Adding a Processor to the topology + +There are two ways to add a processor to the topology. + +#### 1. Processor +All physical topology units are created with the help of a `TopologyBuilder`. Following code snippet shows how to add a Processor to the topology. +``` +Processor processor = new ExampleProcessor(); +builder.addProcessor(processor, paralellism); +``` +`addProcessor()` method of `TopologyBuilder` is used to add the processor. Its first argument is the instance of a Processor which needs to be added. Its second argument is the parallelism hint. It tells the underlying platforms how many parallel instances of this processor should be created on different nodes. + +#### 2. Entrance Processor +Some processors generates their own streams, and they are used as the source of a topology. They connect to external sources, pull data and provide it to the topology in the form of streams. +All physical topology units are created with the help of a `TopologyBuilder`. The following code snippet shows how to add an entrance processor to the topology and create a stream from it. +``` +EntranceProcessor entranceProcessor = new EntranceProcessor(); +builder.addEntranceProcessor(entranceProcessor); +Stream source = builder.createStream(entranceProcessor); +``` + +### Preview of Processor +``` +package samoa.core; +public interface Processor extends java.io.Serializable{ + boolean process(ContentEvent event); + void onCreate(int id); + Processor newProcessor(Processor p); +} +``` +### Methods + +#### 1. `boolean process(ContentEvent event)` +Users should implement the three methods shown above. `process(ContentEvent event)` is the method in which all processing logic should be implemented. `ContentEvent` is a type (interface) which contains the event. This method will be called each time a new event is received. It should return `true` if the event has been correctly processed, `false` otherwise. + +#### 2. `void onCreate(int id)` +is the method in which all initialization code should be written. Multiple copies/instances of the Processor are created based on the parallelism hint specified by the user. SAMOA assigns each instance a unique id which is passed as a parameter `id` to `onCreate(int it)` method of each instance. + +#### 3. `Processor newProcessor(Processor p)` +is very simple to implement. This method is just a technical overhead that has no logical use except that it helps SAMOA in some of its internals. Users should just return a new copy of the instance of this class which implements this Processor interface. + +### Preview of EntranceProcessor +``` +package com.yahoo.labs.samoa.core; + +public interface EntranceProcessor extends Processor { + public boolean isFinished(); + public boolean hasNext(); + public ContentEvent nextEvent(); +} +``` +### Methods + +#### 1. `boolean isFinished()` +returns whether to expect more events coming from the entrance processor. If the source is a live stream this method should return always `false`. If the source is a file, the method should return `true` once the file has been fully processed. + +#### 2. `boolean hasNext()` +returns whether the next event is ready for consumption. If the method returns `true` a subsequent call to `nextEvent` should yield the next event to be processed. If the method returns `false` the engine can use this information to avoid continuously polling the entrance processor. + +#### 3. `ContentEvent nextEvent()` +is the main method for the entrance processor as it returns the next event to be processed by the topology. It should be called only if `isFinished()` returned `false` and `hasNext()` returned `true`. + +### Note +All state variables of the class implementing this interface must be serializable. It can be done by implementing the `Serializable` interface. The simple way to skip this requirement is to declare those variables as `transient` and initialize them in the `onCreate()` method. Remember, all initializations of such transient variables done in the constructor will be lost. http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/SAMOA-Topology.md ---------------------------------------------------------------------- diff --git a/documentation/SAMOA-Topology.md b/documentation/SAMOA-Topology.md new file mode 100644 index 0000000..6f83c03 --- /dev/null +++ b/documentation/SAMOA-Topology.md @@ -0,0 +1,18 @@ +--- +title: Apache SAMOA Topology +layout: documentation +documentation: true +--- +Apache SAMOA allows users to write their stream processing algorithms in an easy and platform independent way. SAMOA defines its own topology which is very intuitive and simple to use. Currently SAMOA has the following basic topology elements. + +1. [Processor](Processor.html) +1. [Content Event](Content-Event.html) +1. [Stream](Stream.html) +1. [Task](Task.html) +1. [Topology Builder](Topology-Builder.html) +1. [Learner](Learner.html) +1. **Advanced topic**: [Processing Item](Processing-Item.html) + +Processor and Content Event are the logical units to build your algorithm, Stream and Task are the physical units to wire the various pieces of your algorithm, whereas Topology Builder is an administrative unit that provides bookkeeping services. Learner is the base interface for learning algorithms. Processing Items are internal wrappers for Processors used inside SAMOA. + + http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/SAMOA-and-Machine-Learning.md ---------------------------------------------------------------------- diff --git a/documentation/SAMOA-and-Machine-Learning.md b/documentation/SAMOA-and-Machine-Learning.md new file mode 100644 index 0000000..e61434d --- /dev/null +++ b/documentation/SAMOA-and-Machine-Learning.md @@ -0,0 +1,13 @@ +--- +title: Apache SAMOA and Machine Learning +layout: documentation +documentation: true +--- +SAMOA's main goal is to help developers to create easily machine learning algorithms on top of any distributed stream processing engine. Here we present the available machine learning algorithms implemented in SAMOA and how to use them. + +* [2.1 Prequential Evaluation Task](Prequential-Evaluation-Task.html) +* [2.2 Vertical Hoeffding Tree Classifier](Vertical-Hoeffding-Tree-Classifier.html) +* [2.3 Adaptive Model Rules Regressor](Adaptive-Model-Rules-Regressor.html) +* [2.4 Bagging and Boosting](Bagging-and-Boosting.html) +* [2.5 Distributed Stream Clustering](Distributed-Stream-Clustering.html) +* [2.6 Distributed Stream Frequent Itemset Mining](Distributed-Stream-Frequent-Itemset-Mining.html) http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/SAMOA-for-MOA-users.md ---------------------------------------------------------------------- diff --git a/documentation/SAMOA-for-MOA-users.md b/documentation/SAMOA-for-MOA-users.md new file mode 100644 index 0000000..b091e2a --- /dev/null +++ b/documentation/SAMOA-for-MOA-users.md @@ -0,0 +1,17 @@ +--- +title: Apache SAMOA for MOA users +layout: documentation +documentation: true +--- +If you're an advanced user of [MOA](http://moa.cms.waikato.ac.nz/), you'll find easy to run SAMOA. You need to note the following: + +* There is no GUI interface in SAMOA +* You can run SAMOA in the following modes: + 1. Simulation Environment. Use `com.yahoo.labs.samoa.DoTask` instead of `moa.DoTask` + 2. Storm Local Mode. Use `com.yahoo.labs.samoa.LocalStormDoTask` instead of `moa.DoTask` + 3. Storm Cluster Mode. You need to use the `samoa` script as it is explained in [Executing SAMOA with Apache Storm](Executing SAMOA with Apache Storm). + 4. S4. You need to use the `samoa` script as it is explained in [Executing SAMOA with Apache S4](Executing SAMOA with Apache S4) + +To start with SAMOA, you can start with a simple example using the CoverType dataset as it is discussed in [Getting Started](Getting Started). + +To use MOA algorithms inside SAMOA, take a look at [https://github.com/samoa-moa/samoa-moa](https://github.com/samoa-moa/samoa-moa). http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Scalable-Advanced-Massive-Online-Analysis.md ---------------------------------------------------------------------- diff --git a/documentation/Scalable-Advanced-Massive-Online-Analysis.md b/documentation/Scalable-Advanced-Massive-Online-Analysis.md new file mode 100644 index 0000000..07b50a5 --- /dev/null +++ b/documentation/Scalable-Advanced-Massive-Online-Analysis.md @@ -0,0 +1,13 @@ +--- +title: Scalable Advanced Massive Online Analysis +layout: documentation +documentation: true +--- +Scalable Advanced Massive Online Analysis (SAMOA) contains various algorithms for machine learning and data mining on data streams, and allows to run them on different distributed stream processing engines (DSPEs) such as Storm and S4. Currently, SAMOA contains methods for classification via Vertical Hoeffding Trees, bagging and boosting and clustering via CluStream. + +In this pages, we explain how to build and execute SAMOA for the different distributed stream processing engines (DSPEs): + +* [Building SAMOA](Building-SAMOA.html) +* [Executing SAMOA with Apache Storm](Executing-SAMOA-with-Apache-Storm.html) +* [Executing SAMOA with Apache S4](Executing-SAMOA-with-Apache-S4.html) +* [Executing SAMOA with Apache Samza](Executing-SAMOA-with-Apache-Samza.html) http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Stream.md ---------------------------------------------------------------------- diff --git a/documentation/Stream.md b/documentation/Stream.md new file mode 100644 index 0000000..527a195 --- /dev/null +++ b/documentation/Stream.md @@ -0,0 +1,49 @@ +--- +title: Stream +layout: documentation +documentation: true +--- +A stream is a physical unit of SAMOA topology which connects different Processors with each other. Stream is also created by a `TopologyBuilder` just like a Processor. A stream can have a single source but many destinations. A Processor which is the source of a stream, owns the stream. + +###1. Creating a Stream +The following code snippet shows how a Stream is created: + +``` +builder.initTopology("MyTopology"); +Processor sourceProcessor = new Sampler(); +builder.addProcessor(samplerProcessor, 3); +Stream sourceDataStream = builder.createStream(sourceProcessor); +``` + +###2. Connecting a Stream +As described above, a Stream can have many destinations. In the following figure, a single stream from sourceProcessor is connected to three different destination Processors each having three instances. + + + +SAMOA supports three different ways of distribution of messages to multiple instances of a Processor. +####2.1 Shuffle +In this way of message distribution, messages/events are distributed randomly among various instances of a Processor. +Following figure shows how the messages are distributed. + +Following code snipped shows how to connect a stream to a destination using random shuffling. + +``` +builder.connectInputShuffleStream(sourceDataStream, destinationProcessor); +``` +####2.2 Key +In this way of message distribution, messages with same key are sent to same instance of a Processor. +Following figure illustrates key-based distribution. + +Following code snippet shows how to connect a stream to a destination using key-based distribution. + +``` +builder.connectInputKeyStream(sourceDataStream, destinationProcessor); +``` +####2.3 All +In this way of message distribution, all messages of a stream are sent to all instances of a destination Processor. Following figure illustrates this distribution process. + +Following code snippet shows how to connect a stream to a destination using All-based distribution. + +``` +builder.connectInputAllStream(sourceDataStream, destinationProcessor); +``` http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Task.md ---------------------------------------------------------------------- diff --git a/documentation/Task.md b/documentation/Task.md new file mode 100644 index 0000000..e583ce7 --- /dev/null +++ b/documentation/Task.md @@ -0,0 +1,54 @@ +--- +title: Task +layout: documentation +documentation: true +--- +Task is similar to a job in Hadoop. Task is an execution entity. A topology must be defined inside a task. SAMOA can only execute classes that implement `Task` interface. + +###1. Implementation +``` +package com.yahoo.labs.samoa.tasks; + +import com.yahoo.labs.samoa.topology.ComponentFactory; +import com.yahoo.labs.samoa.topology.Topology; + +/** + * Task interface, the mother of all SAMOA tasks! + */ +public interface Task { + + /** + * Initialize this SAMOA task, + * i.e. create and connect Processors and Streams + * and initialize the topology + */ + public void init(); + + /** + * Return the final topology object to be executed in the cluster + * @return topology object to be submitted to be executed in the cluster + */ + public Topology getTopology(); + + /** + * Sets the factory. + * TODO: propose to hide factory from task, + * i.e. Task will only see TopologyBuilder, + * and factory creation will be handled by TopologyBuilder + * + * @param factory the new factory + */ + public void setFactory(ComponentFactory factory) ; +} +``` + +###2. Methods +#####2.1 `void init()` +This method should build the desired topology by creating Processors and Streams and connecting them to each other. + +#####2.2 `Topology getTopology()` +This method should return the topology built by `init` to the engine for execution. + +#####2.3 `void setFactory(ComponentFactory factory)` +Utility method to accept a `ComponentFactory` to use in building the topology. + http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Team.md ---------------------------------------------------------------------- diff --git a/documentation/Team.md b/documentation/Team.md new file mode 100644 index 0000000..9e711e2 --- /dev/null +++ b/documentation/Team.md @@ -0,0 +1,41 @@ +--- +title: Apache SAMOA Team +layout: documentation +documentation: true +--- +<h2>Project lead</h2> +<ul> +<li><a href="http://gdfm.me/">Gianmarco De Francisci Morales</a></li> +<li><a href="http://www.albertbifet.com">Albert Bifet</a></li> +</ul> + +<h2>Committers</h2> +<ul> +<li><a href="http://www.cse.usf.edu/~nkourtel/">Nicolas Kourtellis</a></li> +<li><a href="http://www.linkedin.com/pub/faisal-moeen/40/17/512">Faisal Moeen</a></li> +<li><a href="http://www.linkedin.com/in/mmorel">Matthieu Morel</a></li> +<li><a href="http://www.otnira.com">Arinto Murdopo</a></li> +<li><a href="http://cs.brown.edu/~matteo/">Matteo Riondato</a></li> +<li><a href="https://twitter.com/AntonioSeverien">Antonio Severien</a></li> +<li><a href="http://www.van-laere.net">Olivier Van Laere</a></li> +<li><a href="http://www.linkedin.com/in/caseyvu">Anh Thu Vu</a></li> + + +</ul> + +<h2>Contributors</h2> +<ul> +<li><a href="http://www.lsi.upc.edu/~marias/">Marta Arias</a></li> +<li><a href="http://www.lsi.upc.edu/~gavalda/">Ricard Gavaldà </a></li> +<li><a href="http://dme.rwth-aachen.de/de/team/hassani">Marwan Hassani</a></li> +<li><a href="http://www.scms.waikato.ac.nz/genquery.php?linklist=SCMS&linktype=folder&linkname=The_Dean-0">Geoff Holmes</a></li> +<li><a href="http://dme.rwth-aachen.de/de/team/jansen">Timm Jansen</a></li> +<li>Richard Kirkby</li> +<li><a href="http://dme.rwth-aachen.de/de/team/kranen">Philipp Kranen</a></li> +<li><a href="http://dme.rwth-aachen.de/de/team/kremer">Hardy Kremer</a></li> +<li><a href="http://www.cs.waikato.ac.nz/~bernhard">Bernhard Pfahringer</a></li> +<li><a href="http://users.ics.aalto.fi/jesse/">Jesse Read</a></li> +<li><a href="http://www.cs.waikato.ac.nz/~fracpete">Peter Reutemann</a></li> +<li><a href="http://dme.rwth-aachen.de/de/team/seidl">Thomas Seidl</a></li> + +</ul> http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Topology-Builder.md ---------------------------------------------------------------------- diff --git a/documentation/Topology-Builder.md b/documentation/Topology-Builder.md new file mode 100644 index 0000000..fe07120 --- /dev/null +++ b/documentation/Topology-Builder.md @@ -0,0 +1,33 @@ +--- +title: Topology Builder +layout: documentation +documentation: true +--- +`TopologyBuilder` is a builder class which builds physical units of the topology and assemble them together. Each topology has a name. Following code snippet shows all the steps of creating a topology with one `EntrancePI`, two PIs and a few streams. + +``` +TopologyBuilder builder = new TopologyBuilder(factory) // ComponentFactory factory +builder.initTopology("Parma Topology"); //initiates an empty topology with a name +//********************************Topology building*********************************** +StreamSource sourceProcessor = new StreamSource(inputPath,d,sampleSize,fpmGap,epsilon,phi,numSamples); +builder.addEntranceProcessor(sourceProcessor); +Stream sourceDataStream = builder.createStream(sourceProcessor); +sourceProcessor.setDataStream(sourceDataStream); +Stream sourceControlStream = builder.createStream(sourceProcessor); +sourceProcessor.setControlStream(sourceControlStream); + +Sampler sampler = new Sampler(minFreqPercent,sampleSize,(float)epsilon,outputPath,sampler); +builder.addProcessor(sampler, numSamples); +builder.connectInputAllStream(sourceControlStream, sampler); +builder.connectInputShuffleStream(sourceDataStream, sampler); + +Stream samplerDataStream = builder.createStream(sampler); +samplerP.setSamplerDataStream(samplerDataStream); +Stream samplerControlStream = builder.createStream(sampler); +samplerP.setSamplerControlStream(samplerControlStream); + +Aggregator aggregatorProcessor = new Aggregator(outputPath,(long)numSamples,(long)sampleSize,(long)reqApproxNum,(float)epsilon); +builder.addProcessor(aggregatorProcessor, numAggregators); +builder.connectInputKeyStream(samplerDataStream, aggregatorProcessor); +builder.connectInputAllStream(samplerControlStream, aggregatorProcessor); +``` http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/Vertical-Hoeffding-Tree-Classifier.md ---------------------------------------------------------------------- diff --git a/documentation/Vertical-Hoeffding-Tree-Classifier.md b/documentation/Vertical-Hoeffding-Tree-Classifier.md new file mode 100644 index 0000000..a3c027d --- /dev/null +++ b/documentation/Vertical-Hoeffding-Tree-Classifier.md @@ -0,0 +1,26 @@ +--- +title: Vertical Hoeffding Tree +layout: documentation +documentation: true +--- +Vertical Hoeffding Tree (VHT) classifier is a distributed classifier that utilizes vertical parallelism on top of the Very Fast Decision Tree (VFDT) or Hoeffding Tree classifier. + +### Very Fast Decision Tree (VFDT) classifier +[Hoeffding Tree or VFDT](http://doi.acm.org/10.1145/347090.347107) is the standard decision tree algorithm for data stream classification. VFDT uses the Hoeffding bound to decide the minimum number of arriving instances to achieve certain level of confidence in splitting the node. This confidence level determines how close the statistics between the attribute chosen by VFDT and the attribute chosen by decision tree for batch learning. + +For a more comprehensive summary of VFDT, read chapter 3 of [Data Stream Mining: A Practical Approach](http://heanet.dl.sourceforge.net/project/moa-datastream/documentation/StreamMining.pdf). + +### Vertical Parallelism +Vertical Parallelism is a parallelism approach which partitions the instances in term of attribute for parallel processing. Vertical-parallelism-based decision tree induction processes the partitioned instances (which consists of subset of attribute) to calculate the information-theoretic criteria in parallel. For example, if we have instances with 100 attributes and we partition the instances into 5 portions, we will have 20 attributes per portion. The algorithm processes the 20 attributes in parallel to determine the "local" best attribute to split and combine the parallel computation results to determine the "global" best attribute to split and grow the tree. + +For more explanation about available parallelism types for decision tree induction, you can read chapter 4 of [Distributed Decision Tree Learning for Mining Big Data Streams](../SAMOA-Developers-Guide-0-0-1.pdf), the Developer's Guide of SAMOA. + +### Vertical Hoeffding Tree (VHT) classifier +VHT is implemented using the SAMOA API. The diagram below shows the implementation: + + +The _source Processor_ and the _evaluator Processor_ are components of the [prequential evaluation task](Prequential-Evaluation-Task) in SAMOA. The _model-aggregator Processor_ contains the decision tree model. It connects to _local-statistic Processor_ via _attribute_ stream and _control_ stream. The _model-aggregator Processor_ splits instances based on attribute and each _local-statistic Processor_ contains local statistic for attributes that assigned to it. The _model-aggregator Processor_ sends the split instances via attribute stream and it sends control messages to ask _local-statistic Processor_ to perform computation via _control_ stream. Users configure _n_, which is the parallelism level of the algorithm. The parallelism level is translated into the number of local-statistic Processors in the algorithm. + +The _model-aggregator Processor_ sends the classification result via _result_ stream to the _evaluator Processor_ for the corresponding evaluation task or other destination Processor. The _evaluator Processor_ performs an evaluation of the algorithm showing accuracy and throughput. Incoming instances to the _model-aggregator Processor_ arrive via _source_ stream. The calculation results from local statistic arrive to the _model-aggregator Processor_ via _computation-result_ stream. + +For more details about the algorithms (i.e. pseudocode), go to section 4.2 of [Distributed Decision Tree Learning for Mining Big Data Streams](../SAMOA-Developers-Guide-0-0-1.pdf), the Developer's Guide of SAMOA. http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/HelloWorldTask.png ---------------------------------------------------------------------- diff --git a/documentation/images/HelloWorldTask.png b/documentation/images/HelloWorldTask.png new file mode 100644 index 0000000..0166aa4 Binary files /dev/null and b/documentation/images/HelloWorldTask.png differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/PrequentialEvaluation.png ---------------------------------------------------------------------- diff --git a/documentation/images/PrequentialEvaluation.png b/documentation/images/PrequentialEvaluation.png new file mode 100644 index 0000000..c0c742c Binary files /dev/null and b/documentation/images/PrequentialEvaluation.png differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA Explain All Shuffling.png ---------------------------------------------------------------------- diff --git a/documentation/images/SAMOA Explain All Shuffling.png b/documentation/images/SAMOA Explain All Shuffling.png new file mode 100644 index 0000000..3c0e044 Binary files /dev/null and b/documentation/images/SAMOA Explain All Shuffling.png differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA Explain Key Shuffling.png ---------------------------------------------------------------------- diff --git a/documentation/images/SAMOA Explain Key Shuffling.png b/documentation/images/SAMOA Explain Key Shuffling.png new file mode 100644 index 0000000..4fbc2f9 Binary files /dev/null and b/documentation/images/SAMOA Explain Key Shuffling.png differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA Explain Shuffling.png ---------------------------------------------------------------------- diff --git a/documentation/images/SAMOA Explain Shuffling.png b/documentation/images/SAMOA Explain Shuffling.png new file mode 100644 index 0000000..8427bce Binary files /dev/null and b/documentation/images/SAMOA Explain Shuffling.png differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA FIM.jpg ---------------------------------------------------------------------- diff --git a/documentation/images/SAMOA FIM.jpg b/documentation/images/SAMOA FIM.jpg new file mode 100644 index 0000000..8724910 Binary files /dev/null and b/documentation/images/SAMOA FIM.jpg differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA FIM.png ---------------------------------------------------------------------- diff --git a/documentation/images/SAMOA FIM.png b/documentation/images/SAMOA FIM.png new file mode 100644 index 0000000..4c14d2f Binary files /dev/null and b/documentation/images/SAMOA FIM.png differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/SAMOA Message Shuffling.png ---------------------------------------------------------------------- diff --git a/documentation/images/SAMOA Message Shuffling.png b/documentation/images/SAMOA Message Shuffling.png new file mode 100644 index 0000000..bb71402 Binary files /dev/null and b/documentation/images/SAMOA Message Shuffling.png differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/Topology.png ---------------------------------------------------------------------- diff --git a/documentation/images/Topology.png b/documentation/images/Topology.png new file mode 100644 index 0000000..11571ff Binary files /dev/null and b/documentation/images/Topology.png differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/VHT.png ---------------------------------------------------------------------- diff --git a/documentation/images/VHT.png b/documentation/images/VHT.png new file mode 100644 index 0000000..3241761 Binary files /dev/null and b/documentation/images/VHT.png differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/hamr.png ---------------------------------------------------------------------- diff --git a/documentation/images/hamr.png b/documentation/images/hamr.png new file mode 100644 index 0000000..c79ca0d Binary files /dev/null and b/documentation/images/hamr.png differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/documentation/images/vamr.png ---------------------------------------------------------------------- diff --git a/documentation/images/vamr.png b/documentation/images/vamr.png new file mode 100644 index 0000000..53c6d58 Binary files /dev/null and b/documentation/images/vamr.png differ http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/index.html ---------------------------------------------------------------------- diff --git a/index.html b/index.html index 9adf1aa..bc3d6d2 100644 --- a/index.html +++ b/index.html @@ -16,16 +16,18 @@ <h2>Scalable Advanced Massive Online Analysis</h2> <h3>Apache SAMOA is currently undergoing Incubation at the Apache Software Foundation. - </br>New <a href="https://github.com/yahoo/samoa/releases"> Release 0.2.0</a> ! - </br>View on <a href="https://github.com/yahoo/samoa">GitHub <small>yahoo/samoa</small></a> + </br>Last release before entering incubation/not at Apache: <a href="https://github.com/yahoo/samoa/releases"> Release 0.2.0</a> ! + </br>View on <a href=" https://github.com/apache/incubator-samoa">GitHub <small> Apache SAMOA</small></a> </h3> <div id="slideshow"> - <img src="images/slideshow/runtime.png" alt="SAMOA" class="slide active" height= "300" /> + <img src="images/slideshow/runtime.png" alt="Apache SAMOA" class="slide active" height= "300" /> </div> </section> <section id="tutorial" class="next-steps"> - <h1>Apache SAMOA is distributed streaming machine learning (ML) framework that contains a + <h1><a href="http://incubator.apache.org/"><img style="max-width:55%;border:0px solid black;"src="http://incubator.apache.org/images/egg-logo.png" alt="Apache Incubator" > </a> + +Apache SAMOA is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.</h1> <h2>Apache SAMOA enables development of new ML algorithms without dealing with @@ -50,10 +52,10 @@ in multiple SPEs, i.e., code the algorithms once and execute them in multiple SP <p>Hands-on with Apache SAMOA: Getting Started in 5 minutes!</p> </li> <li class="guides"> - <a class="hero-octicon" href="https://github.com/yahoo/samoa/wiki/"> + <a class="hero-octicon" href="documentation/Home.html"> <span class="mega-octicon octicon-book"></span> </a> - <h4><a href="https://github.com/yahoo/samoa/wiki/">Documentation</a></h4> + <h4><a href="documentation/Home.html">Documentation</a></h4> <p>Learn how to use Apache SAMOA in the various different ways possible. </p> </li> @@ -62,7 +64,7 @@ in multiple SPEs, i.e., code the algorithms once and execute them in multiple SP <section class="tutorial"> <h1>Slides</h1> -<h2><a href="https://speakerdeck.com/gdfm/samoa-a-platform-for-mining-big-data-streams-2"><img src="samoa-slides.jpg" alt="SAMOA Slides" style="max-width:100%;"></a></h2> +<h2><a href="https://speakerdeck.com/gdfm/samoa-a-platform-for-mining-big-data-streams-2"><img src="samoa-slides.jpg" alt="SAMOA Slides" data-canonical-src="samoa-slides.jpg" style="max-width:100%;"></a></h2> <h2>G. De Francisci Morales <a href="http://melmeric.files.wordpress.com/2013/04/samoa-a-platform-for-mining-big-data-streams.pdf">SAMOA: A Platform for Mining Big Data Streams</a> Keynote Talk at <a href="http://www.ramss.ws/2013/program/">RAMSS '13</a>: 2nd International Workshop on Real-Time Analysis and Mining of Social Streams WWW, Rio De Janeiro, 2013.</h2> </section> @@ -73,7 +75,7 @@ Keynote Talk at <a href="http://www.ramss.ws/2013/program/">RAMSS '13</a>: 2nd I <h1>Apache SAMOA Developer's Guide</h1> -<h2><a href="SAMOA-Developers-Guide-0-0-1.pdf"><img style="max-width:95%;border:3px solid black;" src="Manual.png" alt="SAMOA Developer's guide" height="250"> </a></h2> +<h2><a href="SAMOA-Developers-Guide-0-3-0.pdf"><img style="max-width:95%;border:3px solid black;" src="Manual.png" alt="SAMOA Developer's guide" height="250"> </a></h2> </section><section class="tutorial"> <h1>API Javadoc Reference</h1> @@ -88,7 +90,7 @@ Keynote Talk at <a href="http://www.ramss.ws/2013/program/">RAMSS '13</a>: 2nd I <a href="mailto:[email protected]">[email protected]</a></h2> <h1>Contributors</h1> -<h2><a href="contributors.html">List of contributors to the SAMOA project</a>.</h2> +<h2><a href="documentation/Team.html">List of contributors to the SAMOA project</a>.</h2> </section><section class="next-steps"> <h1>License</h1> @@ -111,27 +113,27 @@ Apache License, Version 2.0 (<a href="http://www.apache.org/licenses/LICENSE-2.0 <div class="terminal"> <div class="header"></div> <div class="shell"> - <p><span class="path">~</span><span class="prompt">$</span>git clone [email protected]:yahoo/samoa.git</p> -<p><span class="path">~</span><span class="prompt">$</span>cd samoa</p> + <p><span class="path">~</span><span class="prompt">$</span>git clone http://git.apache.org/incubator-samoa.git</p> +<p><span class="path">~</span><span class="prompt">$</span>cd incubator-samoa</p> <p><span class="path">~</span><span class="prompt">$</span>mvn -Pstorm package</p> </div> </div> -<p>The deployable jar for Apache SAMOA will be in <code>target/SAMOA-Storm-0.0.1-SNAPSHOT.jar</code>.</p> +<p>The deployable jar for Apache SAMOA will be in <code>target/SAMOA-Storm-0.3.0-SNAPSHOT.jar</code>.</p> </li> <li id="terminal-step-1" class="option-terminal"> <h4>Apache S4</h4> <p>If you want to compile Apache SAMOA for S4, you will need to install the S4 dependencies -manually as explained in <a href="https://github.com/yahoo/samoa/wiki/Executing-SAMOA-with-Apache-S4">Executing Apache SAMOA with Apache S4</a>.</p> +manually as explained in <a href="documentation/Executing-SAMOA-with-Apache-S4.html">Executing Apache SAMOA with Apache S4</a>.</p> <div class="terminal"> <div class="header"></div> <div class="shell"> - <p><span class="path">~</span><span class="prompt">$</span>git clone [email protected]:yahoo/samoa.git</p> -<p><span class="path">~</span><span class="prompt">$</span>cd samoa</p> + <p><span class="path">~</span><span class="prompt">$</span>git clone http://git.apache.org/incubator-samoa.git</p> +<p><span class="path">~</span><span class="prompt">$</span>cd incubator-samoa</p> <p><span class="path">~</span><span class="prompt">$</span>mvn -Ps4 package</p> </div> </div> -<p>The deployable jar for Apache SAMOA will be in <code>target/SAMOA-S4-0.0.1-SNAPSHOT.jar</code>.</p> +<p>The deployable jar for Apache SAMOA will be in <code>target/SAMOA-S4-0.3.0-SNAPSHOT.jar</code>.</p> </li> <li id="terminal-step-1" class="option-terminal"> @@ -140,12 +142,12 @@ manually as explained in <a href="https://github.com/yahoo/samoa/wiki/Executing- <div class="terminal"> <div class="header"></div> <div class="shell"> - <p><span class="path">~</span><span class="prompt">$</span>git clone [email protected]:yahoo/samoa.git</p> -<p><span class="path">~</span><span class="prompt">$</span>cd samoa</p> + <p><span class="path">~</span><span class="prompt">$</span>git clone http://git.apache.org/incubator-samoa.git</p> +<p><span class="path">~</span><span class="prompt">$</span>cd incubator-samoa</p> <p><span class="path">~</span><span class="prompt">$</span>mvn package</p> </div> </div> -<p>The deployable jar for Apache SAMOA will be in <code>target/SAMOA-Local-0.0.1-SNAPSHOT.jar</code>.</p> +<p>The deployable jar for Apache SAMOA will be in <code>target/SAMOA-Local-0.3.0-SNAPSHOT.jar</code>.</p> </li> </ul> @@ -164,8 +166,8 @@ manually as explained in <a href="https://github.com/yahoo/samoa/wiki/Executing- <div class="terminal"> <div class="header"></div> <div class="shell"> - <p><span class="path">~</span><span class="prompt">$</span>git clone [email protected]:yahoo/samoa.git</p> -<p><span class="path">~</span><span class="prompt">$</span>cd samoa</p> + <p><span class="path">~</span><span class="prompt">$</span>git clone http://git.apache.org/incubator-samoa.git</p> +<p><span class="path">~</span><span class="prompt">$</span>cd incubator-samoa</p> <p><span class="path">~</span><span class="prompt">$</span>mvn package</p> </div> </div> @@ -174,7 +176,7 @@ manually as explained in <a href="https://github.com/yahoo/samoa/wiki/Executing- <li id="terminal-step-1" class="option-terminal"> <h4>Download the Forest CoverType dataset </h4> <p>If you want to compile Apache SAMOA for S4, you will need to install the S4 dependencies -manually as explained in <a href="https://github.com/yahoo/samoa/wiki/Executing-SAMOA-with-Apache-S4">Executing Apache SAMOA with Apache S4</a>.</p> +manually as explained in <a href="documentation/Executing-SAMOA-with-Apache-S4">Executing Apache SAMOA with Apache S4</a>.</p> <div class="terminal"> <div class="header"></div> <div class="shell"> @@ -191,7 +193,7 @@ manually as explained in <a href="https://github.com/yahoo/samoa/wiki/Executing- <div class="terminal"> <div class="header"></div> <div class="shell"> - <p><span class="path">~</span><span class="prompt">$</span>bin/samoa local target/SAMOA-Local-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -l classifiers.ensemble.Bagging + <p><span class="path">~</span><span class="prompt">$</span>bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar "PrequentialEvaluation -l classifiers.ensemble.Bagging -s (ArffFileStream -f covtypeNorm.arff) -f 100000"</p> </div> </div> @@ -203,7 +205,9 @@ manually as explained in <a href="https://github.com/yahoo/samoa/wiki/Executing- <section class="tutorial"> <h2><a href="http://incubator.apache.org/"><img style="max-width:95%;border:0px solid black;"src="http://incubator.apache.org/images/egg-logo.png" alt="Apache Incubator" > </a></h2> <h2> -Apache SAMOA is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the name of Apache TLP sponsor. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.</h2> +Apache SAMOA is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.</h2> + +<h2>Apache and the Apache feather logo are trademarks of The Apache Software Foundation.</h2> </section> <script src="js/jquery.js"></script> @@ -217,9 +221,8 @@ Apache SAMOA is an effort undergoing incubation at The Apache Software Foundatio <footer class="page-footer"> <ul class="site-footer-links right"> - <li><a href="https://github.com/yahoo/samoa/zipball/master">Download <strong>ZIP File</strong></a></li> - <li><a href="https://github.com/yahoo/samoa/tarball/master">Download <strong>TAR Ball</strong></a></li> - <li><a href="https://github.com/yahoo/samoa">View On <strong>GitHub</strong></a></li> + + <li><a href="https://github.com/apache/incubator-samoa">View On <strong>GitHub</strong></a></li> </ul> <a href="/"> @@ -229,8 +232,8 @@ Apache SAMOA is an effort undergoing incubation at The Apache Software Foundatio <ul class="site-footer-links"> <li>© 2014 <span>Apache SAMOA</span></li> <li><a href="#build">Build Apache SAMOA</a></h4> - <li><a href="https://github.com/yahoo/samoa/wiki/Getting%20Started">Getting started!</a></li> - <li><a href="https://github.com/yahoo/samoa/wiki/">Documentation</a></li> + <li><a href="documentation/Getting-Started.html">Getting started!</a></li> + <li><a href="documentation/Home.html">Documentation</a></li> </ul> </footer> http://git-wip-us.apache.org/repos/asf/incubator-samoa/blob/7acb1c47/params.json ---------------------------------------------------------------------- diff --git a/params.json b/params.json index 78ad0dc..0dbcbc5 100644 --- a/params.json +++ b/params.json @@ -1 +1 @@ -{"name":"SAMOA","tagline":"Scalable Advanced Massive Online Analysis","body":"SAMOA is a platform for mining on big data streams.\r\nIt is a distributed streaming machine learning (ML) framework that contains a \r\nprograming abstraction for distributed streaming ML algorithms.\r\n\r\nSAMOA enables development of new ML algorithms without dealing with \r\nthe complexity of underlying streaming processing engines (SPE, such \r\nas Apache Storm and Apache S4). SAMOA also provides extensibility in integrating\r\nnew SPEs into the framework. These features allow SAMOA users to develop \r\ndistributed streaming ML algorithms once and to execute the algorithms \r\nin multiple SPEs, i.e., code the algorithms once and execute them in multiple SPEs.\r\n\r\n## Build\r\n\r\n###Storm\r\n\r\nSimply clone the repository and install SAMOA.\r\n```bash\r\ngit clone [email protected]:yahoo/samoa.git\r\ncd samoa\r\nmvn -Pstorm package\r\n```\r\n\r\nThe deployable jar for SAMOA will be in `target/SAMOA-St orm-0.0.1.jar`.\r\n\r\n###S4\r\n\r\nIf you want to compile SAMOA for S4, you will need to install the S4 dependencies\r\nmanually as explained in [Executing SAMOA with Apache S4](https://github.com/yahoo/samoa/wiki/Executing-SAMOA-with-Apache-S4).\r\n\r\nOnce the dependencies if needed are installed, you can simply clone the repository and install SAMOA.\r\n\r\n```bash\r\ngit clone [email protected]:yahoo/samoa.git\r\ncd samoa\r\nmvn -Ps4 package\r\n```\r\n\r\nThe deployable jars for SAMOA will be in `target/SAMOA-S4-0.0.1.jar`.\r\n\r\n## Documentation\r\n\r\nThe documentation is intended to give an introduction on how to use SAMOA in the various different ways possible. \r\nAs a user you can use it to develop new algorithms and test different Stream Processing Engines.\r\n\r\n* [1 Scalable Advanced Massive Online Analysis](https://github.com/yahoo/samoa/wiki/Scalable Advanced Massive Online Analysis)\r\n * [1.0 Building SAMOA](https://github.com/yahoo/samoa/wiki/Building SAMOA)\r\n * [1.1 Executing SAMOA with Apache Storm](https://github.com/yahoo/samoa/wiki/Executing SAMOA with Apache Storm)\r\n * [1.2 Executing SAMOA with Apache S4](https://github.com/yahoo/samoa/wiki/Executing-SAMOA-with-Apache-S4)\r\n* [2 SAMOA and Machine Learning](https://github.com/yahoo/samoa/wiki/SAMOA and Machine Learning)\r\n * [2.1 Prequential Evaluation Task](https://github.com/yahoo/samoa/wiki/Prequential Evaluation Task)\r\n * [2.2 Vertical Hoeffding Tree Classifier](https://github.com/yahoo/samoa/wiki/Vertical Hoeffding Tree Classifier)\r\n * [2.3 Distributed Stream Clustering](https://github.com/yahoo/samoa/wiki/Distributed Stream Clustering)\r\n* [3 SAMOA Topology](https://github.com/yahoo/samoa/wiki/SAMOA Topology)\r\n * [3.1 Processor](https://github.com/yahoo/samoa/wiki/Processor)\r\n * [3.2 Content Event](https://github.com/yahoo/samoa/wiki/Content Event)\r\n * [3.3 Stream](https://github.com/yahoo/samoa/wiki/Stream)\r\n * [3.4 Task](https://gi thub.com/yahoo/samoa/wiki/Task)\r\n * [3.5 Topology Builder](https://github.com/yahoo/samoa/wiki/Topology Builder)\r\n * [3.6 Topology Starter](https://github.com/yahoo/samoa/wiki/Topology Starter)\r\n * [3.7 Learner](https://github.com/yahoo/samoa/wiki/Learner)\r\n * [3.8 Processing Item](https://github.com/yahoo/samoa/wiki/Processing Item)\r\n* [4 Developing New Tasks in SAMOA](https://github.com/yahoo/samoa/wiki/Developing New Tasks in SAMOA)\r\n\r\n## Slides\r\n\r\nG. De Francisci Morales [SAMOA: A Platform for Mining Big Data Streams](http://melmeric.files.wordpress.com/2013/04/samoa-a-platform-for-mining-big-data-streams.pdf)\r\nKeynote Talk at [RAMSS â13](http://www.ramss.ws/2013/program/): 2nd International Workshop on Real-Time Analysis and Mining of Social Streams WWW, Rio De Janeiro, 2013.\r\n\r\n<script async class=\"speakerdeck-embed\" data-id=\"fee15d509f0a0130a1252e07bed0c63d\" data-ratio=\"1.33333333333333\" src=\"//speakerdeck.com/assets/embed.js\"></s cript>\r\n\r\n## License\r\n\r\nThe use and distribution terms for this software are covered by the\r\nApache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0.html).\r\n","google":"","note":"Don't delete this file! It's used internally to help with page regeneration."} \ No newline at end of file +{"name":"SAMOA","tagline":"Scalable Advanced Massive Online Analysis","body":"SAMOA is a platform for mining on big data streams.\r\nIt is a distributed streaming machine learning (ML) framework that contains a \r\nprograming abstraction for distributed streaming ML algorithms.\r\n\r\nSAMOA enables development of new ML algorithms without dealing with \r\nthe complexity of underlying streaming processing engines (SPE, such \r\nas Apache Storm and Apache S4). SAMOA also provides extensibility in integrating\r\nnew SPEs into the framework. These features allow SAMOA users to develop \r\ndistributed streaming ML algorithms once and to execute the algorithms \r\nin multiple SPEs, i.e., code the algorithms once and execute them in multiple SPEs.\r\n\r\n## Build\r\n\r\n###Storm\r\n\r\nSimply clone the repository and install SAMOA.\r\n```bash\r\ngit clone http://git.apache.org/incubator-samoa.git\r\ncd samoa\r\nmvn -Pstorm package\r\n```\r\n\r\nThe deployable jar for SAMOA will be in `targ et/SAMOA-Storm-0.0.1.jar`.\r\n\r\n###S4\r\n\r\nIf you want to compile SAMOA for S4, you will need to install the S4 dependencies\r\nmanually as explained in [Executing SAMOA with Apache S4](documentation/Executing-SAMOA-with-Apache-S4).\r\n\r\nOnce the dependencies if needed are installed, you can simply clone the repository and install SAMOA.\r\n\r\n```bash\r\ngit clone http://git.apache.org/incubator-samoa.git\r\ncd samoa\r\nmvn -Ps4 package\r\n```\r\n\r\nThe deployable jars for SAMOA will be in `target/SAMOA-S4-0.0.1.jar`.\r\n\r\n## Documentation\r\n\r\nThe documentation is intended to give an introduction on how to use SAMOA in the various different ways possible. \r\nAs a user you can use it to develop new algorithms and test different Stream Processing Engines.\r\n\r\n* [1 Scalable Advanced Massive Online Analysis](documentation/Scalable Advanced Massive Online Analysis)\r\n * [1.0 Building SAMOA](documentation/Building SAMOA)\r\n * [1.1 Executing SAMOA with Apache Storm ](documentation/Executing SAMOA with Apache Storm)\r\n * [1.2 Executing SAMOA with Apache S4](documentation/Executing-SAMOA-with-Apache-S4)\r\n* [2 SAMOA and Machine Learning](documentation/SAMOA and Machine Learning)\r\n * [2.1 Prequential Evaluation Task](documentation/Prequential Evaluation Task)\r\n * [2.2 Vertical Hoeffding Tree Classifier](documentation/Vertical Hoeffding Tree Classifier)\r\n * [2.3 Distributed Stream Clustering](documentation/Distributed Stream Clustering)\r\n* [3 SAMOA Topology](documentation/SAMOA Topology)\r\n * [3.1 Processor](documentation/Processor)\r\n * [3.2 Content Event](documentation/Content Event)\r\n * [3.3 Stream](documentation/Stream)\r\n * [3.4 Task](documentation/Task)\r\n * [3.5 Topology Builder](documentation/Topology Builder)\r\n * [3.6 Topology Starter](documentation/Topology Starter)\r\n * [3.7 Learner](documentation/Learner)\r\n * [3.8 Processing Item](documentation/Processing Item)\r\n* [4 Developing New Tasks in SAMOA](documentation/Developing New Tasks in SAMOA)\r\n\r\n## Slides\r\n\r\nG. De Francisci Morales [SAMOA: A Platform for Mining Big Data Streams](http://melmeric.files.wordpress.com/2013/04/samoa-a-platform-for-mining-big-data-streams.pdf)\r\nKeynote Talk at [RAMSS â13](http://www.ramss.ws/2013/program/): 2nd International Workshop on Real-Time Analysis and Mining of Social Streams WWW, Rio De Janeiro, 2013.\r\n\r\n<script async class=\"speakerdeck-embed\" data-id=\"fee15d509f0a0130a1252e07bed0c63d\" data-ratio=\"1.33333333333333\" src=\"//speakerdeck.com/assets/embed.js\"></script>\r\n\r\n## License\r\n\r\nThe use and distribution terms for this software are covered by the\r\nApache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0.html).\r\n","google":"","note":"Don't delete this file! It's used internally to help with page regeneration."}
