Added: incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html?rev=1661475&view=auto ============================================================================== --- incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html (added) +++ incubator/samoa/site/documentation/Developing-New-Tasks-in-SAMOA.html Sun Feb 22 13:41:20 2015 @@ -0,0 +1,245 @@ +<!DOCTYPE html> +<html> + + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="description" content=""> + <meta name="author" content=""> + <link rel="icon" href="/assets/favicon.ico"> + + <title>Developing New Tasks in Apache SAMOA</title> + + <!-- Bootstrap core CSS --> + <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> + <!-- Bootstrap theme --> + <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> + + <!-- Custom styles for this template --> + <link href="/assets/css/theme.css" rel="stylesheet"> + + <link href="/css/main.css" rel="stylesheet"> + + <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> + <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> + <script src="/assets/js/ie-emulation-modes-warning.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + </head> + + + + <body> + <div class="container"> + <!-- Fixed navbar --> + <nav class="navbar navbar-default navbar-fixed-top" role="navigation"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/index.html">Apache SAMOA</a> + </div> + <div id="navbar" class="navbar-collapse collapse"> + <ul class="nav navbar-nav"> + <li><a href="/index.html">Home</a></li> + <li><a href="/documentation/Home.html">Documentation</a></li> + <li><a href="/documentation/Team.html">Contributors</a></li> + <li><a href="/documentation/Bylaws.html">Bylaws</a></li> + </ul> + </div><!--/.nav-collapse --> + </div> + </nav> + + + + + + <!-- Documentation --> +<!-- <div class="container"> --> + + <header class="post-header"> + <h1 class="post-title">Developing New Tasks in Apache SAMOA</h1> + <p class="post-meta"></p> + </header> + + <article class="post-content"> + <p>A <em>task</em> is a machine learning related activity such as a specific evaluation for a classifier. For instance the <em>prequential evaluation</em> task is a task that uses each instance first for testing and then for training a model built using a specific classification algorithm. A task corresponds to a topology in SAMOA. </p> + +<p>In this tutorial, we will develop a simple Hello World task.</p> + +<h3 id="hello-world-task">Hello World Task</h3> + +<p>The Hello World task consists of a source processor, a destination processor with a parallelism hint setting, and a stream that connects the two. The source processor will generate a random integer which will be sent to the destination processor. The figure below shows the layout of Hello World task.</p> + +<p><img src="images/HelloWorldTask.png" alt="Hello World Task"></p> + +<p>To develop the task, we create a new class that implements the interface <code>com.yahoo.labs.samoa.tasks.Task</code>. For convenience we also implement <code>com.github.javacliparser.Configurable</code> which allows to parse command-line options.</p> + +<p>The <code>init</code> method builds the topology by instantiating the necessary <code>Processors</code>, <code>Streams</code> and connecting the source processor with the destination processor.</p> + +<h3 id="hello-world-source-processor">Hello World Source Processor</h3> + +<p>We need a source processor which is an instance of <code>EntranceProcessor</code> to start a task in SAMOA. In this tutorial, the source processor is <code>HelloWorldSourceProcessor</code>. </p> + +<p>The SAMOA runtime invokes the <code>nextEvent</code> method of <code>EntranceProcessor</code> until its <code>hasNext</code> method returns false. Each call to <code>nextEvent</code> should return the next <code>ContentEvent</code> to be sent to the topology. In this tutorial, <code>HelloWorldSourceProcessor</code> sends events of type <code>HelloWorldContentEvent</code>.</p> + +<p>Here is the relevant code in <code>HelloWorldSourceProcessor</code>:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text"> private Random rnd; + private final long maxInst; + private long count; + + @Override + public boolean hasNext() { + return count < maxInst; + } + + @Override + public ContentEvent nextEvent() { + count++; + return new HelloWorldContentEvent(rnd.nextInt(), false); + } +</code></pre></div> +<p>We also need to create a new type of <code>ContentEvent</code> to hold our data. In this tutorial we call it <code>HelloWorldContentEvent</code> and its content is simply an integer.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">public class HelloWorldContentEvent implements ContentEvent { + + private static final long serialVersionUID = -2406968925730298156L; + private final boolean isLastEvent; + private final int helloWorldData; + + public HelloWorldContentEvent(int helloWorldData, boolean isLastEvent) { + this.isLastEvent = isLastEvent; + this.helloWorldData = helloWorldData; + } + + @Override + public String getKey() { + return null; + } + + @Override + public void setKey(String str) { + // do nothing, it's key-less content event + } + + @Override + public boolean isLastEvent() { + return isLastEvent; + } + + public int getHelloWorldData() { + return helloWorldData; + } + + @Override + public String toString() { + return "HelloWorldContentEvent [helloWorldData=" + helloWorldData + "]"; + } +} +</code></pre></div> +<h3 id="hello-world-destination-processor">Hello World Destination Processor</h3> + +<p>The destination processor for SAMOA is pretty straightforward and it will print the data from the event.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">public class HelloWorldDestinationProcessor implements Processor { + + private static final long serialVersionUID = -6042613438148776446L; + private int processorId; + + @Override + public boolean process(ContentEvent event) { + System.out.println(processorId + ": " + event); + return true; + } + + @Override + public void onCreate(int id) { + this.processorId = id; + } + + @Override + public Processor newProcessor(Processor p) { + return new HelloWorldDestinationProcessor(); + } +} +</code></pre></div> +<h3 id="putting-it-all-together">Putting It All Together</h3> + +<p>To put all the components together, we need to go back to class <code>HelloWorldTask</code>. First, we need to implement the code for setting up the <code>TopologyBuilder</code>. This code is necessary to be able to run on multiple platforms.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text"> @Override + public void setFactory(ComponentFactory factory) { + builder = new TopologyBuilder(factory); + logger.debug("Sucessfully instantiating TopologyBuilder"); + builder.initTopology(evaluationNameOption.getValue()); + logger.debug("Sucessfully initializing SAMOA topology with name {}", evaluationNameOption.getValue()); + } +</code></pre></div> +<p>After this method is called we have a functioning builder to get components for our topology. Next, the <code>init</code> method is called by SAMOA to start the task. +First we instantiate the source <code>EntranceProcessor</code>. +After adding the entrance processor to the topology, we create a stream originating from it. We use the create stream method of <code>TopologyBuilder</code>. +Next we create the destination processor and connect it to the stream by using shuffle grouping. +Once we have created all the components, we use the builder to build the topology.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text"> @Override + public void init() { + // create source EntranceProcesor + sourceProcessor = new HelloWorldSourceProcessor(instanceLimitOption.getValue()); + builder.addEntranceProcessor(sourceProcessor); + + // create Stream + Stream stream = builder.createStream(sourceProcessor); + + // create destination Processor + destProcessor = new HelloWorldDestinationProcessor(); + builder.addProcessor(destProcessor, helloWorldParallelismOption.getValue()); + builder.connectInputShuffleStream(stream, destProcessor); + + // build the topology + helloWorldTopology = builder.build(); + logger.debug("Successfully built the topology"); + } +</code></pre></div> +<h3 id="running-it">Running It</h3> + +<p>To run the example in local mode:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">bin/samoa local target/SAMOA-Local-0.0.1-SNAPSHOT.jar "com.yahoo.labs.samoa.examples.HelloWorldTask -p 4 -i 100" +</code></pre></div> +<p>To run the example in Storm local mode:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">java -cp $STORM_HOME/lib/*:$STORM_HOME/storm-0.8.2.jar:target/SAMOA-Storm-0.0.1-SNAPSHOT.jar com.yahoo.labs.samoa.LocalStormDoTask "com.yahoo.labs.samoa.examples.HelloWorldTask -p 4 -i 1000" +</code></pre></div> +<p>All the code for the HelloWorldTask and its components can be found <a href="https://github.com/yahoo/samoa/tree/master/samoa-api/src/main/java/com/yahoo/labs/samoa/examples">here</a>.</p> + + </article> + +<!-- </div> --> + + + + <hr/> +<div id="footer" class="container text-center"> + + <p class="text-muted credit"><p> +Copyright © 2014 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache feather logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> + +</div> + + <!-- Bootstrap core JavaScript + ================================================== --> + <!-- Placed at the end of the document so the pages load faster --> + <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script> + <script src="/assets/js/bootstrap.min.js"></script> + <script src="/assets/js/docs.min.js"></script> + <!-- IE10 viewport hack for Surface/desktop Windows 8 bug --> + <script src="/assets/js/ie10-viewport-bug-workaround.js"></script> + + </div> + + </body> + +</html>
Added: incubator/samoa/site/documentation/Distributed-Stream-Clustering.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Distributed-Stream-Clustering.html?rev=1661475&view=auto ============================================================================== --- incubator/samoa/site/documentation/Distributed-Stream-Clustering.html (added) +++ incubator/samoa/site/documentation/Distributed-Stream-Clustering.html Sun Feb 22 13:41:20 2015 @@ -0,0 +1,120 @@ +<!DOCTYPE html> +<html> + + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="description" content=""> + <meta name="author" content=""> + <link rel="icon" href="/assets/favicon.ico"> + + <title>Distributed Stream Clustering</title> + + <!-- Bootstrap core CSS --> + <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> + <!-- Bootstrap theme --> + <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> + + <!-- Custom styles for this template --> + <link href="/assets/css/theme.css" rel="stylesheet"> + + <link href="/css/main.css" rel="stylesheet"> + + <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> + <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> + <script src="/assets/js/ie-emulation-modes-warning.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + </head> + + + + <body> + <div class="container"> + <!-- Fixed navbar --> + <nav class="navbar navbar-default navbar-fixed-top" role="navigation"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/index.html">Apache SAMOA</a> + </div> + <div id="navbar" class="navbar-collapse collapse"> + <ul class="nav navbar-nav"> + <li><a href="/index.html">Home</a></li> + <li><a href="/documentation/Home.html">Documentation</a></li> + <li><a href="/documentation/Team.html">Contributors</a></li> + <li><a href="/documentation/Bylaws.html">Bylaws</a></li> + </ul> + </div><!--/.nav-collapse --> + </div> + </nav> + + + + + + <!-- Documentation --> +<!-- <div class="container"> --> + + <header class="post-header"> + <h1 class="post-title">Distributed Stream Clustering</h1> + <p class="post-meta"></p> + </header> + + <article class="post-content"> + <h2 id="apache-samoa-clustering-algorithm">Apache SAMOA Clustering Algorithm</h2> + +<p>The SAMOA Clustering Algorithm is invoked by using the <code>ClusteringEvaluation</code> task. The clustering task can be executed with default values just by running:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "ClusteringEvaluation" +</code></pre></div> +<p>Parameters:</p> + +<ul> +<li><code>-l</code>: clusterer to train</li> +<li><code>-s</code>: stream to learn from</li> +<li><code>-i</code>: maximum number of instances to test/train on (-1 = no limit)</li> +<li><code>-f</code>: how many instances between samples of the learning performance</li> +<li><code>-n</code>: evaluation name (default: ClusteringEvaluation_TimeStamp)</li> +<li><code>-d</code>: file to append intermediate csv results to</li> +</ul> + +<p>In terms of the SAMOA API, Clustering Evaluation consists of a <code>source</code> processor, a <code>clusterer</code>, and a <code>evaluator</code> processor. <code>Source</code> processor sends the instances to the classifier using <code>source</code> stream. The clusterer sends the clustering results to the <code>evaluator</code> processor via the <code>result</code> stream. The <code>source Processor</code> corresponds to the <code>-s</code> option of Clustering Evaluation, and the clusterer corresponds to the <code>-l</code> option.</p> + + </article> + +<!-- </div> --> + + + + <hr/> +<div id="footer" class="container text-center"> + + <p class="text-muted credit"><p> +Copyright © 2014 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache feather logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> + +</div> + + <!-- Bootstrap core JavaScript + ================================================== --> + <!-- Placed at the end of the document so the pages load faster --> + <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script> + <script src="/assets/js/bootstrap.min.js"></script> + <script src="/assets/js/docs.min.js"></script> + <!-- IE10 viewport hack for Surface/desktop Windows 8 bug --> + <script src="/assets/js/ie10-viewport-bug-workaround.js"></script> + + </div> + + </body> + +</html> Added: incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html?rev=1661475&view=auto ============================================================================== --- incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html (added) +++ incubator/samoa/site/documentation/Distributed-Stream-Frequent-Itemset-Mining.html Sun Feb 22 13:41:20 2015 @@ -0,0 +1,167 @@ +<!DOCTYPE html> +<html> + + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="description" content=""> + <meta name="author" content=""> + <link rel="icon" href="/assets/favicon.ico"> + + <title>Distributed Frequent Itemset Mining</title> + + <!-- Bootstrap core CSS --> + <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> + <!-- Bootstrap theme --> + <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> + + <!-- Custom styles for this template --> + <link href="/assets/css/theme.css" rel="stylesheet"> + + <link href="/css/main.css" rel="stylesheet"> + + <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> + <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> + <script src="/assets/js/ie-emulation-modes-warning.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + </head> + + + + <body> + <div class="container"> + <!-- Fixed navbar --> + <nav class="navbar navbar-default navbar-fixed-top" role="navigation"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/index.html">Apache SAMOA</a> + </div> + <div id="navbar" class="navbar-collapse collapse"> + <ul class="nav navbar-nav"> + <li><a href="/index.html">Home</a></li> + <li><a href="/documentation/Home.html">Documentation</a></li> + <li><a href="/documentation/Team.html">Contributors</a></li> + <li><a href="/documentation/Bylaws.html">Bylaws</a></li> + </ul> + </div><!--/.nav-collapse --> + </div> + </nav> + + + + + + <!-- Documentation --> +<!-- <div class="container"> --> + + <header class="post-header"> + <h1 class="post-title">Distributed Frequent Itemset Mining</h1> + <p class="post-meta"></p> + </header> + + <article class="post-content"> + <h2 id="1.-introduction">1. Introduction</h2> + +<p>SAMOA takes a micro-batching approach to frequent itemset mining (FIM). It uses <a href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a> as a base algorithm for distributed sample-based frequent itemset mining. PARMA provides the guaranty that all the frequent itemsets would be present in the result that it returns.It also returns some false positives. The problem with FIM in streams is that the stream has an evolving nature. The itemsets that were frequent last year may not be frequent this year. To handle this, SAMOA implements <a href="https://dl.acm.org/citation.cfm?id=1164180">Time Biased Sampling</a> approach. This sampling method depends on a parameter <em>lambda</em> which determines the size of the reservoir sample. This also tells us how much biased the sample would be towards newer itemsets. As PARMA has its own way of determining sample sizes, SAMOA does not allow users to choose <em>lambda</em> and determines its value using the sample size determined by PARMA using the approximation <code>lambda = 1/sampleSize</code>. </p> + +<h2 id="2.-concepts">2. Concepts</h2> + +<p>SAMOA implements FIM for streams in three processors i.e. StreamSourceProcessor, SamplerProcessor and AggregatorProcessor. The tasks of each of these are explained below.</p> + +<ol> +<li><p>StreamSourceP takes as input the input transaction file. StreamSourceProcessor (Entrance PI) starts sending the transactions randomly to SamplerProcessor instances. The number of SamplerProcessors to instantiate is taken as an argument from the user but is verified by PARMA. PARMA determines this number based on the <code>epsilon</code> and <code>phi</code> parameters provided by the user. StreamSourceProcessor sends an FPM='yes' command to all the instances of SamplerProcessor after 2M transactions where M=numSamples*sampleSize. After first FPM='yes' command, all later FPM='yes' commands are sent after <code>fpmGap</code> transactions which is one of the parameter SAMOA FIM task takes as input.</p></li> +<li><p>All the instances of SamplerProcessor start building a Time Biased Reservoir Sample in which newer transactions have more weight. Time biased sampling is the default approach but user can provide his own sampler by implementing <code>samoa.samplers.SamplerInterface</code>. When a SamplerProcessor receives FPM='yes' command, it starts FIM/FPM on the reservoir irrespective of whether the reservoir is full or not. When it completes, it sends the result item-sets to the AggregatorProcessor with the epoch/batch id. At the end of the result, each SamplerProcessor sends the (âepoch_endâ,<epochNum>) message to the AggregatorProcessor.</p></li> +<li><p>AggregatorProcessor receives the result item-sets from all SamplerProcessors. It maintains different queues for different batch ids and also maintains a count of the number of SamplerProcessors which have finished sending their results for a corresponding batch/epoch. Whenever the <code>epoch_end</code> message count becomes equal to the number of instances of SampleProcessor, AggregatorProcessor aggregates the results and stores it in the file system using the output path specified by the user.</p></li> +</ol> + +<p>In this way, epochs never overlap.If <code>fpmGap</code> is small and the StreamSourceProcessor dispatches an FPM='yes' command before the slowest SamplerProcessor finishes FIM on the last epoch, the speed of the global FIM will be equal to the local FIM of the slowest SamplerProcessor. (or AggregatorProcessor if it is slower than the slowest SamplerProcessor)</p> + +<p><img src="images/SAMOA%20FIM.jpg" alt="SAMOA FIM"></p> + +<h2 id="3.-how-to-run">3. How to run</h2> + +<p>Following is an example of the command used to run the SAMOA FIM task.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "FpmTask -t Myfpmtopology -r (com.yahoo.labs.samoa.fpm.processors.FileReaderProcessor -i /datasets/freqDataCombined.txt) -m (com.yahoo.labs.samoa.fpm.processors.ParmaStreamFpmMiner -e .1 -d .1 -f 10 -t 20 -n 23 -p 0.08 -b 100000 -s com.yahoo.labs.samoa.samplers.reservoir.TimeBiasedReservoirSampler) -w (com.yahoo.labs.samoa.fpm.processors.FileWriterProcessor -o /output/outPARMA) " +</code></pre></div> +<p>Parameters: +To run an FIM task, four parameters are required</p> + +<ul> +<li><code>-t</code>: Topology name (Can be any name)</li> +<li><code>-r</code>: The reader class</li> +<li><code>-m</code>: The miner class</li> +<li><code>-w</code>: The writer class</li> +</ul> + +<p>In the example above, <code>FileReaderProcessor</code> is used as a reader class. It takes only one parameter:</p> + +<ul> +<li><code>-i</code>: Path to input file</li> +</ul> + +<p>Similarly, <code>FileWriterProcessor</code> is used as a writer class. It takes only one parameter:</p> + +<ul> +<li><code>-o</code>: Path to output file</li> +</ul> + +<p>SAMOA comes with a built-in distributed frequent mining algorithm PARMA as described above but users can plug-in their custom miners by implementing the <code>FpmMinerInterface</code>. The built-in PARMA miner can be used with the following parameters:</p> + +<ul> +<li><code>-e</code>: epsilon parameter for <a href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a></li> +<li><code>-d</code>: delta parameter for <a href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a></li> +<li><code>-f</code>: minimum frequency (percentage) of a frequent itemset</li> +<li><code>-t</code>: maximum length of a transaction</li> +<li><code>-n</code>: number of samples to maintain</li> +<li><code>-a</code>: number of aggregators to initiate</li> +<li><code>-p</code>: phi parameter for <a href="https://dl.acm.org/citation.cfm?id=2396776">PARMA</a></li> +<li><code>-i</code>: path to input file</li> +<li><code>-o</code>: path to output file</li> +<li><code>-b</code>: batch size or fpmGap (Number of transactions after which FIM should be performed)</li> +<li><code>-s</code>: Sampler Class to be used for sampling at each node</li> +</ul> + +<h2 id="note">Note</h2> + +<p>This method is currently unavailable in the master branch of SAMOA due to licensing restriction.</p> + + </article> + +<!-- </div> --> + + + + <hr/> +<div id="footer" class="container text-center"> + + <p class="text-muted credit"><p> +Copyright © 2014 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache feather logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> + +</div> + + <!-- Bootstrap core JavaScript + ================================================== --> + <!-- Placed at the end of the document so the pages load faster --> + <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script> + <script src="/assets/js/bootstrap.min.js"></script> + <script src="/assets/js/docs.min.js"></script> + <!-- IE10 viewport hack for Surface/desktop Windows 8 bug --> + <script src="/assets/js/ie10-viewport-bug-workaround.js"></script> + + </div> + + </body> + +</html> Added: incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html?rev=1661475&view=auto ============================================================================== --- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html (added) +++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-S4.html Sun Feb 22 13:41:20 2015 @@ -0,0 +1,200 @@ +<!DOCTYPE html> +<html> + + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="description" content=""> + <meta name="author" content=""> + <link rel="icon" href="/assets/favicon.ico"> + + <title>Executing Apache SAMOA with Apache S4</title> + + <!-- Bootstrap core CSS --> + <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> + <!-- Bootstrap theme --> + <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> + + <!-- Custom styles for this template --> + <link href="/assets/css/theme.css" rel="stylesheet"> + + <link href="/css/main.css" rel="stylesheet"> + + <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> + <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> + <script src="/assets/js/ie-emulation-modes-warning.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + </head> + + + + <body> + <div class="container"> + <!-- Fixed navbar --> + <nav class="navbar navbar-default navbar-fixed-top" role="navigation"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/index.html">Apache SAMOA</a> + </div> + <div id="navbar" class="navbar-collapse collapse"> + <ul class="nav navbar-nav"> + <li><a href="/index.html">Home</a></li> + <li><a href="/documentation/Home.html">Documentation</a></li> + <li><a href="/documentation/Team.html">Contributors</a></li> + <li><a href="/documentation/Bylaws.html">Bylaws</a></li> + </ul> + </div><!--/.nav-collapse --> + </div> + </nav> + + + + + + <!-- Documentation --> +<!-- <div class="container"> --> + + <header class="post-header"> + <h1 class="post-title">Executing Apache SAMOA with Apache S4</h1> + <p class="post-meta"></p> + </header> + + <article class="post-content"> + <p>In this tutorial page we describe how to execute SAMOA on top of Apache S4.</p> + +<h2 id="prerequisites">Prerequisites</h2> + +<p>The following dependencies are needed to run SAMOA smoothly on Apache S4</p> + +<ul> +<li><a href="http://www.gradle.org/">Gradle</a></li> +<li><a href="https://incubator.apache.org/s4/">Apache S4</a></li> +</ul> + +<h2 id="gradle">Gradle</h2> + +<p>Gradle is a build automation tool and is used to build Apache S4. The installation guide can be found <a href="http://www.gradle.org/docs/current/userguide/installation.html">here.</a> The following instructions is a simplified installation guide.</p> + +<ol> +<li>Download Gradle binaries from <a href="http://services.gradle.org/distributions/gradle-1.6-bin.zip">downloads</a>, or from the console type <code>wget http://services.gradle.org/distributions/gradle-1.6-bin.zip</code></li> +<li>Unzip the file <code>unzip gradle-1.6-bin.zip</code></li> +<li>Set the Gradle environment variable: <code>export GRADLE_HOME=/foo/bar/gradle-1.6</code></li> +<li>Add to the systems path <code>export PATH=$PATH:$GRADLE_HOME/bin</code></li> +<li>Install Gradle by running <code>gradle</code></li> +</ol> + +<p>Now you are all set to install Apache S4</p> + +<h2 id="apache-s4">Apache S4</h2> + +<p>S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. The installation process is as follows:</p> + +<ol> +<li>Download the latest Apache S4 release from <a href="http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip">Apache S4 0.6.0</a> or from command line <code>wget http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip</code> or clone from git. +<code>git clone https://git-wip-us.apache.org/repos/asf/incubator-s4.git</code>.</li> +<li>Unzip the file <code>unzip apache-s4-0.6.0-incubating-src.zip</code> or go in the cloned directory.</li> +<li>Set the Apache S4 environment variable <code>export S4_HOME=/foo/bar/apache-s4-0.6.0-incubating-src</code>.</li> +<li>Add the S4_HOME to the system PATH. <code>export PATH=$PATH:$S4_HOME</code>.</li> +<li>Once the previous steps are done we can proceed to build and install Apache S4.</li> +<li>You can have a look at the available build tasks by typing <code>gradle tasks</code>.</li> +<li>There are some dependencies issues, therefore you should run the wrapper task first by typing <code>gradle wrapper</code>.</li> +<li>Install the artifacts for Apache S4 by running <code>gradle install</code> in the S4_HOME directory.</li> +<li>Install the S4-TOOLS, <code>gradle s4-tools::installApp</code>.</li> +</ol> + +<p>Done. Now you can configure and run your Apache S4 cluster.</p> + +<hr> + +<h2 id="building-samoa">Building SAMOA</h2> + +<p>Once the S4 dependencies are installed, you can simply clone the repository and install SAMOA.</p> +<div class="highlight"><pre><code class="language-bash" data-lang="bash">git clone http://git.apache.org/incubator-samoa.git +<span class="nb">cd </span>incubator-samoa +mvn -Ps4 package +</code></pre></div> +<p>The deployable jars for SAMOA will be in <code>target/SAMOA-<variant>-<version>-SNAPSHOT.jar</code>. For example, in our case for S4 <code>target/SAMOA-S4-0.3.0-SNAPSHOT.jar</code>.</p> + +<hr> + +<h2 id="samoa-s4-configuration">SAMOA-S4 Configuration</h2> + +<p>This section will go through the <code>bin/samoa-s4.properties</code> file and how to configure it. +In order for SAMOA to run correctly in a distributed environment there are some variables that need to be defined. Since Apache S4 uses <a href="https://zookeeper.apache.org/">ZooKeeper</a> for cluster management we need to define where it is running.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text"># Zookeeper Server +zookeeper.server=localhost +zookeeper.port=2181 +</code></pre></div> +<p>Apache S4 also distributes the application via HTTP, therefore the server and port which contains the S4 application must be provided.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text"># Simple HTTP Server providing the packaged S4 jar +http.server.ip=localhost +http.server.port=8000 +</code></pre></div> +<p>Apache S4 uses the concept of logical clusters to define a group of machines, which are identified by an ID and start serving on a specific port.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text"># Name of the S4 cluster +cluster.name=cluster +cluster.port=12000 +</code></pre></div> +<p>SAMOA can be deployed on a single machine using only one resource or in a cluster environments. The following property can be defined to deploy as a <code>local</code> application or on a <code>cluster</code>.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text"># Deployment strategy +samoa.deploy.mode=local +</code></pre></div> +<hr> + +<h2 id="samoa-s4-deployment">SAMOA S4 Deployment</h2> + +<p>In order to deploy SAMOA in a distributed environment you <strong>MUST</strong> configure the <code>bin/samoa-s4.properties</code> file correctly. If you are running locally it is optional to modify the properties file.</p> + +<p>The deployment is done by running the SAMOA execution script <code>bin/samoa</code> with some additional parameters. +The execution syntax is as follows: +<code>bin/samoa <platform> <jar-location> <task & options></code></p> + +<p>Example:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">bin/samoa S4 target/SAMOA-S4-0.0.1-SNAPSHOT.jar "ClusteringEvaluation" +</code></pre></div> +<p>The <platform> can be s4 or storm.</p> + +<p>The <jar-location> must be the absolute path to the platform specific jar file.</p> + +<p>The <task & options> should be the name of a known task and the options belonging to that task.</p> + + </article> + +<!-- </div> --> + + + + <hr/> +<div id="footer" class="container text-center"> + + <p class="text-muted credit"><p> +Copyright © 2014 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache feather logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> + +</div> + + <!-- Bootstrap core JavaScript + ================================================== --> + <!-- Placed at the end of the document so the pages load faster --> + <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script> + <script src="/assets/js/bootstrap.min.js"></script> + <script src="/assets/js/docs.min.js"></script> + <!-- IE10 viewport hack for Surface/desktop Windows 8 bug --> + <script src="/assets/js/ie10-viewport-bug-workaround.js"></script> + + </div> + + </body> + +</html> Added: incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html?rev=1661475&view=auto ============================================================================== --- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html (added) +++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Samza.html Sun Feb 22 13:41:20 2015 @@ -0,0 +1,322 @@ +<!DOCTYPE html> +<html> + + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="description" content=""> + <meta name="author" content=""> + <link rel="icon" href="/assets/favicon.ico"> + + <title>Executing Apache SAMOA with Apache Samza</title> + + <!-- Bootstrap core CSS --> + <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> + <!-- Bootstrap theme --> + <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> + + <!-- Custom styles for this template --> + <link href="/assets/css/theme.css" rel="stylesheet"> + + <link href="/css/main.css" rel="stylesheet"> + + <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> + <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> + <script src="/assets/js/ie-emulation-modes-warning.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + </head> + + + + <body> + <div class="container"> + <!-- Fixed navbar --> + <nav class="navbar navbar-default navbar-fixed-top" role="navigation"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/index.html">Apache SAMOA</a> + </div> + <div id="navbar" class="navbar-collapse collapse"> + <ul class="nav navbar-nav"> + <li><a href="/index.html">Home</a></li> + <li><a href="/documentation/Home.html">Documentation</a></li> + <li><a href="/documentation/Team.html">Contributors</a></li> + <li><a href="/documentation/Bylaws.html">Bylaws</a></li> + </ul> + </div><!--/.nav-collapse --> + </div> + </nav> + + + + + + <!-- Documentation --> +<!-- <div class="container"> --> + + <header class="post-header"> + <h1 class="post-title">Executing Apache SAMOA with Apache Samza</h1> + <p class="post-meta"></p> + </header> + + <article class="post-content"> + <p>This tutorial describes how to run SAMOA on Apache Samza. +The steps included in this tutorial are:</p> + +<ol> +<li><p>Setup and configure a cluster with the required dependencies. This applies for single-node (local) execution as well.</p></li> +<li><p>Build SAMOA deployables</p></li> +<li><p>Configure SAMOA-Samza</p></li> +<li><p>Deploy SAMOA-Samza and execute a task</p></li> +<li><p>Observe the execution and the result</p></li> +</ol> + +<h2 id="setup-cluster">Setup cluster</h2> + +<p>The following are needed to to run SAMOA on top of Samza:</p> + +<ul> +<li><a href="http://zookeeper.apache.org/">Apache Zookeeper</a></li> +<li><a href="http://kafka.apache.org/">Apache Kafka</a></li> +<li><a href="http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html">Apache Hadoop YARN and HDFS</a></li> +</ul> + +<h3 id="zookeeper">Zookeeper</h3> + +<p>Zookeeper is used by Kafka to coordinate its brokers. The detail instructions to setup a Zookeeper cluster can be found <a href="http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html">here</a>. </p> + +<p>To quickly setup a single-node Zookeeper cluster:</p> + +<ol> +<li><p>Download the binary release from the <a href="http://zookeeper.apache.org/releases.html">release page</a>.</p></li> +<li><p>Untar the archive</p></li> +</ol> +<div class="highlight"><pre><code class="language-text" data-lang="text">tar -xf $DOWNLOAD_DIR/zookeeper-3.4.6.tar.gz -C ~/ +</code></pre></div> +<ol> +<li>Copy the default configuration file</li> +</ol> +<div class="highlight"><pre><code class="language-text" data-lang="text">cp zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg +</code></pre></div> +<ol> +<li>Start the single-node cluster</li> +</ol> +<div class="highlight"><pre><code class="language-text" data-lang="text">~/zookeeper-3.4.6/bin/zkServer.sh start +</code></pre></div> +<h3 id="kafka">Kafka</h3> + +<p>Kafka is a distributed, partitioned, replicated commit log service which Samza uses as its default messaging system. </p> + +<ol> +<li><p>Download a binary release of Kafka <a href="http://kafka.apache.org/downloads.html">here</a>. As mentioned in the page, the Scala version does not matter. However, 2.10 is recommended as Samza has recently been moved to Scala 2.10.</p></li> +<li><p>Untar the archive </p></li> +</ol> +<div class="highlight"><pre><code class="language-text" data-lang="text">tar -xzf $DOWNLOAD_DIR/kafka_2.10-0.8.1.tgz -C ~/ +</code></pre></div> +<p>If you are running in local mode or a single-node cluster, you can now start Kafka with the command:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">~/kafka_2.10-0.8.1/bin/kafka-server-start.sh kafka_2.10-0.8.1/config/server.properties +</code></pre></div> +<p>In multi-node cluster, it is typical and convenient to have a Kafka broker on each node (although you can totally have a smaller Kafka cluster, or even a single-node Kafka cluster). The number of brokers in Kafka cluster will affect disk bandwidth and space (the more brokers we have, the higher value we will get for the two). In each node, you need to set the following properties in <code>~/kafka_2.10-0.8.1/config/server.properties</code> before starting Kafka service.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">broker.id=a-unique-number-for-each-node +zookeeper.connect=zookeeper-host0-url:2181[,zookeeper-host1-url:2181,...] +</code></pre></div> +<p>You might want to change the retention hours or retention bytes of the logs to avoid the logs size from growing too big.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">log.retention.hours=number-of-hours-to-keep-the-logs +log.retention.bytes=number-of-bytes-to-keep-in-the-logs +</code></pre></div> +<h3 id="hadoop-yarn-and-hdfs">Hadoop YARN and HDFS</h3> + +<blockquote> +<p>Hadoop YARN and HDFS are <strong>not</strong> required to run SAMOA in Samza local mode. </p> +</blockquote> + +<p>To set up a YARN cluster, first download a binary release of Hadoop <a href="http://www.apache.org/dyn/closer.cgi/hadoop/common/">here</a> on each node in the cluster and untar the archive +<code>tar -xf $DOWNLOAD_DIR/hadoop-2.2.0.tar.gz -C ~/</code>. We have tested SAMOA with Hadoop 2.2.0 but Hadoop 2.3.0 should work too.</p> + +<p><strong>HDFS</strong></p> + +<p>Set the following properties in <code>~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml</code> in all nodes.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text"><configuration> + <property> + <name>dfs.datanode.data.dir</name> + <value>file:///home/username/hadoop-2.2.0/hdfs/datanode</value> + <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description> + </property> + + <property> + <name>dfs.namenode.name.dir</name> + <value>file:///home/username/hadoop-2.2.0/hdfs/namenode</value> + <description>Path on the local filesystem where the NameNode stores the namespace and transaction logs persistently.</description> + </property> +</configuration> +</code></pre></div> +<p>Add this property in <code>~/hadoop-2.2.0/etc/hadoop/core-site.xml</code> in all nodes.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text"><configuration> + <property> + <name>fs.defaultFS</name> + <value>hdfs://localhost:9000/</value> + <description>NameNode URI</description> + </property> + + <property> + <name>fs.hdfs.impl</name> + <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> + </property> +</configuration> +</code></pre></div> +<p>For a multi-node cluster, change the hostname ("localhost") to the correct host name of your namenode server.</p> + +<p>Format HDFS directory (only perform this if you are running it for the very first time)</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">~/hadoop-2.2.0/bin/hdfs namenode -format +</code></pre></div> +<p>Start namenode daemon on one of the node</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode +</code></pre></div> +<p>Start datanode daemon on all nodes</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">~/hadoop-2.2.0/sbin/hadoop-daemon.sh start datanode +</code></pre></div> +<p><strong>YARN</strong></p> + +<p>If you are running in multi-node cluster, set the resource manager hostname in <code>~/hadoop-2.2.0/etc/hadoop/yarn-site.xml</code> in all nodes as follow:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text"><configuration> + <property> + <name>yarn.resourcemanager.hostname</name> + <value>resourcemanager-url</value> + <description>The hostname of the RM.</description> + </property> +</configuration> +</code></pre></div> +<p><strong>Other configurations</strong> +Now we need to tell Samza where to find the configuration of YARN cluster. To do this, first create a new directory in all nodes:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">mkdir ~/.samza +mkdir ~/.samza/conf +</code></pre></div> +<p>Copy (or soft link) <code>core-site.xml</code>, <code>hdfs-site.xml</code>, <code>yarn-site.xml</code> in <code>~/hadoop-2.2.0/etc/hadoop</code> to the new directory </p> +<div class="highlight"><pre><code class="language-text" data-lang="text">ln -s ~/.samza/conf/core-site.xml ~/hadoop-2.2.0/etc/hadoop/core-site.xml +ln -s ~/.samza/conf/hdfs-site.xml ~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml +ln -s ~/.samza/conf/yarn-site.xml ~/hadoop-2.2.0/etc/hadoop/yarn-site.xml +</code></pre></div> +<p>Export the enviroment variable YARN_HOME (in ~/.bashrc) so Samza knows where to find these YARN configuration files.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">export YARN_HOME=$HOME/.samza +</code></pre></div> +<p><strong>Start the YARN cluster</strong> +Start resource manager on master node</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">~/hadoop-2.2.0/sbin/yarn-daemon.sh start resourcemanager +</code></pre></div> +<p>Start node manager on all worker nodes</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">~/hadoop-2.2.0/sbin/yarn-daemon.sh start nodemanager +</code></pre></div> +<h2 id="build-samoa">Build SAMOA</h2> + +<p>Perform the following step on one of the node in the cluster. Here we assume git and maven are installed on this node.</p> + +<p>Since Samza is not yet released on Maven, we will have to clone Samza project, build and publish to Maven local repository:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">git clone -b 0.7.0 https://github.com/apache/incubator-samza.git +cd incubator-samza +./gradlew clean build +./gradlew publishToMavenLocal +</code></pre></div> +<p>Here we cloned and installed Samza version 0.7.0, the current released version (July 2014). </p> + +<p>Now we can clone the repository and install SAMOA.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">git clone http://git.apache.org/incubator-samoa.git +cd incubator-samoa +mvn -Psamza package +</code></pre></div> +<p>The deployable jars for SAMOA will be in <code>target/SAMOA-<variant>-<version>-SNAPSHOT.jar</code>. For example, in our case for Samza <code>target/SAMOA-Samza-0.2.0-SNAPSHOT.jar</code>.</p> + +<h2 id="configure-samoa-samza-execution">Configure SAMOA-Samza execution</h2> + +<p>This section explains the configuration parameters in <code>bin/samoa-samza.properties</code> that are required to run SAMOA on top of Samza.</p> + +<p><strong>Samza execution mode</strong></p> +<div class="highlight"><pre><code class="language-text" data-lang="text">samoa.samza.mode=[yarn|local] +</code></pre></div> +<p>This parameter specify which mode to execute the task: <code>local</code> for local execution and <code>yarn</code> for cluster execution.</p> + +<p><strong>Zookeeper</strong></p> +<div class="highlight"><pre><code class="language-text" data-lang="text">zookeeper.connect=localhost +zookeeper.port=2181 +</code></pre></div> +<p>The default setting above applies for local mode execution. For cluster mode, change <code>zookeeper.host</code> to the correct URL of your zookeeper host.</p> + +<p><strong>Kafka</strong></p> +<div class="highlight"><pre><code class="language-text" data-lang="text">kafka.broker.list=localhost:9092 +</code></pre></div> +<p><code>kafka.broker.list</code> is a comma separated list of host:port of all the brokers in Kafka cluster.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">kafka.replication.factor=1 +</code></pre></div> +<p><code>kafka.replication.factor</code> specifies the number of replicas for each stream in Kafka. This number must be less than or equal to the number of brokers in Kafka cluster.</p> + +<p><strong>YARN</strong></p> + +<blockquote> +<p>The below settings do not apply for local mode execution, you can leave them as they are.</p> +</blockquote> + +<p><code>yarn.am.memory</code> and <code>yarn.container.memory</code> specify the memory requirement for the Application Master container and the worker containers, respectively. </p> +<div class="highlight"><pre><code class="language-text" data-lang="text">yarn.am.memory=1024 +yarn.container.memory=1024 +</code></pre></div> +<p><code>yarn.package.path</code> specifies the path (typically a HDFS path) of the package to be distributed to all YARN containers to execute the task.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">yarn.package.path=hdfs://samoa/SAMOA-Samza-0.2.0-SNAPSHOT.jar +</code></pre></div> +<p><strong>Samza</strong> +<code>max.pi.per.container</code> specifies the number of PI instances allowed in one YARN container. </p> +<div class="highlight"><pre><code class="language-text" data-lang="text">max.pi.per.container=1 +</code></pre></div> +<p><code>kryo.register.file</code> specifies the registration file for Kryo serializer.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">kryo.register.file=samza-kryo +</code></pre></div> +<p><code>checkpoint.commit.ms</code> specifies the frequency for PIs to commit their checkpoints (in ms). The default value is 1 minute.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">checkpoint.commit.ms=60000 +</code></pre></div> +<h2 id="deploy-samoa-samza-task">Deploy SAMOA-Samza task</h2> + +<p>Execute SAMOA task with the following command:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">bin/samoa samza target/SAMOA-Samza-0.2.0-SNAPSHOT.jar "<task> & <options>" +</code></pre></div> +<h2 id="observe-execution-and-result">Observe execution and result</h2> + +<p>In local mode, all the log will be printed out to stdout. If you execute the task on YARN cluster, the output is written to stdout files in YARN's containers' log folder ($HADOOP_HOME/logs/userlogs/application_<application-id>/container_<container-id>).</p> + + </article> + +<!-- </div> --> + + + + <hr/> +<div id="footer" class="container text-center"> + + <p class="text-muted credit"><p> +Copyright © 2014 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache feather logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> + +</div> + + <!-- Bootstrap core JavaScript + ================================================== --> + <!-- Placed at the end of the document so the pages load faster --> + <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script> + <script src="/assets/js/bootstrap.min.js"></script> + <script src="/assets/js/docs.min.js"></script> + <!-- IE10 viewport hack for Surface/desktop Windows 8 bug --> + <script src="/assets/js/ie10-viewport-bug-workaround.js"></script> + + </div> + + </body> + +</html> Added: incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html?rev=1661475&view=auto ============================================================================== --- incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html (added) +++ incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Storm.html Sun Feb 22 13:41:20 2015 @@ -0,0 +1,203 @@ +<!DOCTYPE html> +<html> + + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="description" content=""> + <meta name="author" content=""> + <link rel="icon" href="/assets/favicon.ico"> + + <title>Executing Apache SAMOA with Apache Storm</title> + + <!-- Bootstrap core CSS --> + <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> + <!-- Bootstrap theme --> + <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> + + <!-- Custom styles for this template --> + <link href="/assets/css/theme.css" rel="stylesheet"> + + <link href="/css/main.css" rel="stylesheet"> + + <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> + <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> + <script src="/assets/js/ie-emulation-modes-warning.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + </head> + + + + <body> + <div class="container"> + <!-- Fixed navbar --> + <nav class="navbar navbar-default navbar-fixed-top" role="navigation"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/index.html">Apache SAMOA</a> + </div> + <div id="navbar" class="navbar-collapse collapse"> + <ul class="nav navbar-nav"> + <li><a href="/index.html">Home</a></li> + <li><a href="/documentation/Home.html">Documentation</a></li> + <li><a href="/documentation/Team.html">Contributors</a></li> + <li><a href="/documentation/Bylaws.html">Bylaws</a></li> + </ul> + </div><!--/.nav-collapse --> + </div> + </nav> + + + + + + <!-- Documentation --> +<!-- <div class="container"> --> + + <header class="post-header"> + <h1 class="post-title">Executing Apache SAMOA with Apache Storm</h1> + <p class="post-meta"></p> + </header> + + <article class="post-content"> + <p>In this tutorial page we describe how to execute SAMOA on top of Apache Storm. Here is an outline of what we want to do:</p> + +<ol> +<li>Ensure that you have necessary Storm cluster and configuration to execute SAMOA</li> +<li>Ensure that you have all the SAMOA deployables for execution in the cluster</li> +<li>Configure samoa-storm.properties</li> +<li>Execute SAMOA classification task</li> +<li>Observe the task execution</li> +</ol> + +<h3 id="storm-configuration">Storm Configuration</h3> + +<p>Before we start the tutorial, please ensure that you already have Storm cluster (preferably Storm 0.8.2) running. You can follow this <a href="http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/">tutorial</a> to set up a Storm cluster.</p> + +<p>You also need to install Storm at the machine where you initiate the deployment, and configure Storm (at least) with this configuration in <code>~/.storm/storm.yaml</code>:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">########### These MUST be filled in for a storm configuration +nimbus.host: "<enter your nimbus host name here>" + +## List of custom serializations +kryo.register: + - com.yahoo.labs.samoa.learners.classifiers.trees.AttributeContentEvent: com.yahoo.labs.samoa.learners.classifiers.trees.AttributeContentEvent$AttributeCEFullPrecSerializer + - com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent: com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer +</code></pre></div> +<!-- +Or, if you are using SAMOA with optimized VHT, you should use this following configuration file: +``` +########### These MUST be filled in for a storm configuration +nimbus.host: "<enter your nimbus host name here>" + +## List of custom serializations +kryo.register: + - com.yahoo.labs.samoa.learners.classifiers.trees.NaiveAttributeContentEvent: com.yahoo.labs.samoa.classifiers.trees.NaiveAttributeContentEvent$NaiveAttributeCEFullPrecSerializer + - com.yahoo.labs.samoa.learners.classifiers.trees.ComputeContentEvent: com.yahoo.labs.samoa.classifiers.trees.ComputeContentEvent$ComputeCEFullPrecSerializer +``` +--> + +<p>Alternatively, if you don't have Storm cluster running, you can execute SAMOA with Storm in local mode as explained in section <a href="#samoa-storm-properties">samoa-storm.properties Configuration</a>.</p> + +<h3 id="samoa-deployables">SAMOA deployables</h3> + +<p>There are three deployables for executing SAMOA on top of Storm. They are:</p> + +<ol> +<li><code>bin/samoa</code> is the main script to execute SAMOA. You do not need to change anything in this script.</li> +<li><code>target/SAMOA-Storm-x.x.x-SNAPSHOT.jar</code> is the deployed jar file. <code>x.x.x</code> is the version number of SAMOA. </li> +<li><code>bin/samoa-storm.properties</code> contains deployment configurations. You need to set the parameters in this properties file correctly. </li> +</ol> + +<h3 id="-samoa-storm.properties-configuration"><a name="samoa-storm-properties"> samoa-storm.properties Configuration</a></h3> + +<p>Currently, the properties file contains two configurations:</p> + +<ol> +<li><code>samoa.storm.mode</code> determines whether the task is executed locally (using Storm's <code>LocalCluster</code>) or executed in a Storm cluster. Use <code>local</code> if you want to test SAMOA and you do not have a Storm cluster for deployment. Use <code>cluster</code> if you want to test SAMOA on your Storm cluster.</li> +<li><code>samoa.storm.numworker</code> determines the number of worker to execute the SAMOA tasks in the Storm cluster. This field must be an integer, less than or equal to the number of available slots in you Storm cluster. If you are using local mode, this property corresponds to the number of thread used by Storm's LocalCluster to execute your SAMOA task.</li> +</ol> + +<p>Here is the example of a complete properties file:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text"># SAMOA Storm properties file +# This file contains specific configurations for SAMOA deployment in the Storm platform +# Note that you still need to configure Storm client in your machine, +# including setting up Storm configuration file (~/.storm/storm.yaml) with correct settings + +# samoa.storm.mode corresponds to the execution mode of the Task in Storm +# possible values: +# 1. cluster: the Task will be sent into nimbus. The nimbus is configured by Storm configuration file +# 2. local: the Task will be sent using local Storm cluster +samoa.storm.mode=cluster + +# samoa.storm.numworker corresponds to the number of worker processes allocated in Storm cluster +# possible values: any integer greater than 0 +samoa.storm.numworker=7 +</code></pre></div> +<h3 id="samoa-task-execution">SAMOA task execution</h3> + +<p>You can execute a SAMOA task using the aforementioned <code>bin/samoa</code> script with this following format: +<code>bin/samoa <platform> <jar> "<task>"</code>.</p> + +<p><code><platform></code> can be <code>storm</code> or <code>s4</code>. Using <code>storm</code> option means you are deploying SAMOA on a Storm environment. In this configuration, the script uses the aforementioned yaml file (<code>~/.storm/storm.yaml</code>) and <code>samoa-storm.properties</code> to perform the deployment. Using <code>s4</code> option means you are deploying SAMOA on an Apache S4 environment. Follow this <a href="Executing-SAMOA-with-Apache-S4">link</a> to learn more about deploying SAMOA on Apache S4.</p> + +<p><code><jar></code> is the location of the deployed jar file (<code>SAMOA-Storm-x.x.x-SNAPSHOT.jar</code>) in your file system. The location can be a relative path or an absolute path into the jar file. </p> + +<p><code>"<task>"</code> is the SAMOA task command line such as <code>PrequentialEvaluation</code> or <code>ClusteringTask</code>. This command line for SAMOA task follows the format of <a href="http://moa.cms.waikato.ac.nz/details/classification/command-line/">Massive Online Analysis (MOA)</a>.</p> + +<p>The complete command to execute SAMOA is:</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">bin/samoa storm target/SAMOA-Storm-0.0.1-SNAPSHOT.jar "PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s (com.yahoo.labs.samoa.moa.streams.generators.RandomTreeGenerator -c 2 -o 10 -u 10)" +</code></pre></div> +<p>The example above uses <a href="Prequential-Evaluation-Task">Prequential Evaluation task</a> and <a href="Vertical-Hoeffding-Tree-Classifier">Vertical Hoeffding Tree</a> classifier. </p> + +<h3 id="observing-task-execution">Observing task execution</h3> + +<p>There are two ways to observe the task execution using Storm UI and by monitoring the dump file of the SAMOA task. Notice that the dump file will be created on the cluster if you are executing your task in <code>cluster</code> mode.</p> + +<h4 id="using-storm-ui">Using Storm UI</h4> + +<p>Go to the web address of Storm UI and check whether the SAMOA task executes as intended. Use this UI to kill the associated Storm topology if necessary.</p> + +<h4 id="monitoring-the-dump-file">Monitoring the dump file</h4> + +<p>Several tasks have options to specify a dump file, which is a file that represents the task output. In our example, <a href="Prequential-Evaluation-Task">Prequential Evaluation task</a> has <code>-d</code> option which specifies the path to the dump file. Since Storm performs the allocation of Storm tasks, you should set the dump file into a file on a shared filesystem if you want to access it from the machine submitting the task.</p> + + </article> + +<!-- </div> --> + + + + <hr/> +<div id="footer" class="container text-center"> + + <p class="text-muted credit"><p> +Copyright © 2014 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache feather logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> + +</div> + + <!-- Bootstrap core JavaScript + ================================================== --> + <!-- Placed at the end of the document so the pages load faster --> + <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script> + <script src="/assets/js/bootstrap.min.js"></script> + <script src="/assets/js/docs.min.js"></script> + <!-- IE10 viewport hack for Surface/desktop Windows 8 bug --> + <script src="/assets/js/ie10-viewport-bug-workaround.js"></script> + + </div> + + </body> + +</html> Added: incubator/samoa/site/documentation/Getting-Started.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Getting-Started.html?rev=1661475&view=auto ============================================================================== --- incubator/samoa/site/documentation/Getting-Started.html (added) +++ incubator/samoa/site/documentation/Getting-Started.html Sun Feb 22 13:41:20 2015 @@ -0,0 +1,127 @@ +<!DOCTYPE html> +<html> + + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="description" content=""> + <meta name="author" content=""> + <link rel="icon" href="/assets/favicon.ico"> + + <title>Getting Started</title> + + <!-- Bootstrap core CSS --> + <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> + <!-- Bootstrap theme --> + <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> + + <!-- Custom styles for this template --> + <link href="/assets/css/theme.css" rel="stylesheet"> + + <link href="/css/main.css" rel="stylesheet"> + + <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> + <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> + <script src="/assets/js/ie-emulation-modes-warning.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + </head> + + + + <body> + <div class="container"> + <!-- Fixed navbar --> + <nav class="navbar navbar-default navbar-fixed-top" role="navigation"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/index.html">Apache SAMOA</a> + </div> + <div id="navbar" class="navbar-collapse collapse"> + <ul class="nav navbar-nav"> + <li><a href="/index.html">Home</a></li> + <li><a href="/documentation/Home.html">Documentation</a></li> + <li><a href="/documentation/Team.html">Contributors</a></li> + <li><a href="/documentation/Bylaws.html">Bylaws</a></li> + </ul> + </div><!--/.nav-collapse --> + </div> + </nav> + + + + + + <!-- Documentation --> +<!-- <div class="container"> --> + + <header class="post-header"> + <h1 class="post-title">Getting Started</h1> + <p class="post-meta"></p> + </header> + + <article class="post-content"> + <p>We start showing how simple is to run a first large scale machine learning task in SAMOA. We will evaluate a bagging ensemble method using decision trees on the Forest Covertype dataset.</p> + +<ul> +<li>1. Download SAMOA </li> +</ul> +<div class="highlight"><pre><code class="language-bash" data-lang="bash">git clone http://git.apache.org/incubator-samoa.git +<span class="nb">cd </span>incubator-samoa +mvn package <span class="c">#Local mode</span> +</code></pre></div> +<ul> +<li>2. Download the Forest CoverType dataset </li> +</ul> +<div class="highlight"><pre><code class="language-bash" data-lang="bash">wget <span class="s2">"http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip"</span> +unzip covtypeNorm.arff.zip +</code></pre></div> +<p><em>Forest Covertype</em> contains the forest cover type for 30 x 30 meter cells obtained from the US Forest Service (USFS) Region 2 Resource Information System (RIS) data. It contains 581,012 instances and 54 attributes, and it has been used in several articles on data stream classification.</p> + +<ul> +<li>3. Run an example: classifying the CoverType dataset with the bagging algorithm</li> +</ul> +<div class="highlight"><pre><code class="language-bash" data-lang="bash">bin/samoa <span class="nb">local </span>target/SAMOA-Local-0.3.0-SNAPSHOT.jar <span class="s2">"PrequentialEvaluation -l classifiers.ensemble.Bagging </span> +<span class="s2"> -s (ArffFileStream -f covtypeNorm.arff) -f 100000"</span> +</code></pre></div> +<p>The output will be a list of the evaluation results, plotted each 100,000 instances.</p> + + </article> + +<!-- </div> --> + + + + <hr/> +<div id="footer" class="container text-center"> + + <p class="text-muted credit"><p> +Copyright © 2014 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache feather logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> + +</div> + + <!-- Bootstrap core JavaScript + ================================================== --> + <!-- Placed at the end of the document so the pages load faster --> + <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script> + <script src="/assets/js/bootstrap.min.js"></script> + <script src="/assets/js/docs.min.js"></script> + <!-- IE10 viewport hack for Surface/desktop Windows 8 bug --> + <script src="/assets/js/ie10-viewport-bug-workaround.js"></script> + + </div> + + </body> + +</html> Added: incubator/samoa/site/documentation/Home.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Home.html?rev=1661475&view=auto ============================================================================== --- incubator/samoa/site/documentation/Home.html (added) +++ incubator/samoa/site/documentation/Home.html Sun Feb 22 13:41:20 2015 @@ -0,0 +1,169 @@ +<!DOCTYPE html> +<html> + + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="description" content=""> + <meta name="author" content=""> + <link rel="icon" href="/assets/favicon.ico"> + + <title>Apache SAMOA Documentation</title> + + <!-- Bootstrap core CSS --> + <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> + <!-- Bootstrap theme --> + <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> + + <!-- Custom styles for this template --> + <link href="/assets/css/theme.css" rel="stylesheet"> + + <link href="/css/main.css" rel="stylesheet"> + + <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> + <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> + <script src="/assets/js/ie-emulation-modes-warning.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + </head> + + + + <body> + <div class="container"> + <!-- Fixed navbar --> + <nav class="navbar navbar-default navbar-fixed-top" role="navigation"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/index.html">Apache SAMOA</a> + </div> + <div id="navbar" class="navbar-collapse collapse"> + <ul class="nav navbar-nav"> + <li><a href="/index.html">Home</a></li> + <li><a href="/documentation/Home.html">Documentation</a></li> + <li><a href="/documentation/Team.html">Contributors</a></li> + <li><a href="/documentation/Bylaws.html">Bylaws</a></li> + </ul> + </div><!--/.nav-collapse --> + </div> + </nav> + + + + + + <!-- Documentation --> +<!-- <div class="container"> --> + + <header class="post-header"> + <h1 class="post-title">Apache SAMOA Documentation</h1> + <p class="post-meta"></p> + </header> + + <article class="post-content"> + <p>Apache SAMOA is a distributed realtime machine learning system, similar to Mahout, but specific designed for stream mining. Apache SAMOA is simple and fun to use!</p> + +<p>This documentation is intended to give an introduction on how to use Apache SAMOA in different ways. As a user you can run Apache SAMOA algorithms into several Stream Processing Engines: local mode, Apache Storm, S4 and Samza. As a developer you can create new algorithms only once and test them in all of these Stream Processing Engines.</p> + +<h2 id="getting-started">Getting Started</h2> + +<ul> +<li><a href="Getting-Started.html">0 Hands-on with SAMOA: Getting Started!</a></li> +</ul> + +<h2 id="users">Users</h2> + +<ul> +<li><a href="Scalable-Advanced-Massive-Online-Analysis.html">1 Building and Executing SAMOA</a> + +<ul> +<li><a href="Building-SAMOA.html">1.0 Building SAMOA</a></li> +<li><a href="Executing-SAMOA-with-Apache-Storm.html">1.1 Executing SAMOA with Apache Storm</a></li> +<li><a href="Executing-SAMOA-with-Apache-S4.html">1.2 Executing SAMOA with Apache S4</a></li> +<li><a href="Executing-SAMOA-with-Apache-Samza.html">1.3 Executing SAMOA with Apache Samza</a></li> +</ul></li> +<li><a href="SAMOA-and-Machine-Learning.html">2 Machine Learning Methods in SAMOA</a> + +<ul> +<li><a href="Prequential-Evaluation-Task.html">2.1 Prequential Evaluation Task</a></li> +<li><a href="Vertical-Hoeffding-Tree-Classifier.html">2.2 Vertical Hoeffding Tree Classifier</a></li> +<li><a href="Adaptive-Model-Rules-Regressor.html">2.3 Adaptive Model Rules Regressor</a></li> +<li><a href="Bagging-and-Boosting.html">2.4 Bagging and Boosting</a></li> +<li><a href="Distributed-Stream-Clustering.html">2.5 Distributed Stream Clustering</a></li> +<li><a href="Distributed-Stream-Frequent-Itemset-Mining.html">2.6 Distributed Stream Frequent Itemset Mining</a></li> +<li><a href="SAMOA-for-MOA-users.html">2.7 SAMOA for MOA users</a></li> +</ul></li> +</ul> + +<h2 id="developers">Developers</h2> + +<ul> +<li><a href="SAMOA-Topology.html">3 Understanding SAMOA Topologies</a> + +<ul> +<li><a href="Processor.html">3.1 Processor</a></li> +<li><a href="Content-Event.html">3.2 Content Event</a></li> +<li><a href="Stream.html">3.3 Stream</a></li> +<li><a href="Task.html">3.4 Task</a></li> +<li><a href="Topology-Builder.html">3.5 Topology Builder</a></li> +<li><a href="Learner.html">3.6 Learner</a></li> +<li><a href="Processing-Item.html">3.7 Processing Item</a></li> +</ul></li> +<li><a href="Developing-New-Tasks-in-SAMOA.html">4 Developing New Tasks in SAMOA</a></li> +</ul> + +<h3 id="getting-help">Getting help</h3> + +<h4 id="apache-samoa-users">Apache SAMOA Users</h4> + +<p>Samoa users should send messages and subscribe to <a href="mailto:[email protected]">[email protected]</a>.</p> + +<p>You can subscribe to this list by sending an email to <a href="mailto:[email protected]">[email protected]</a>. Likewise, you can cancel a subscription by sending an email to <a href="mailto:[email protected]">[email protected]</a>.</p> + +<h4 id="apache-samoa-developers">Apache SAMOA Developers</h4> + +<p>Storm developers should send messages and subscribe to <a href="mailto:[email protected]">[email protected]</a>.</p> + +<p>You can subscribe to this list by sending an email to <a href="mailto:[email protected]">[email protected]</a>. Likewise, you can cancel a subscription by sending an email to <a href="mailto:[email protected]">[email protected]</a>.</p> + +<p><strong>NOTE:</strong> The google groups account <a href="mailto:[email protected]">[email protected]</a> is now officially deprecated in favor of the Apache-hosted user/dev mailing lists.</p> + + </article> + +<!-- </div> --> + + + + <hr/> +<div id="footer" class="container text-center"> + + <p class="text-muted credit"><p> +Copyright © 2014 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache feather logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> + +</div> + + <!-- Bootstrap core JavaScript + ================================================== --> + <!-- Placed at the end of the document so the pages load faster --> + <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script> + <script src="/assets/js/bootstrap.min.js"></script> + <script src="/assets/js/docs.min.js"></script> + <!-- IE10 viewport hack for Surface/desktop Windows 8 bug --> + <script src="/assets/js/ie10-viewport-bug-workaround.js"></script> + + </div> + + </body> + +</html> Added: incubator/samoa/site/documentation/Learner.html URL: http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Learner.html?rev=1661475&view=auto ============================================================================== --- incubator/samoa/site/documentation/Learner.html (added) +++ incubator/samoa/site/documentation/Learner.html Sun Feb 22 13:41:20 2015 @@ -0,0 +1,116 @@ +<!DOCTYPE html> +<html> + + <head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="description" content=""> + <meta name="author" content=""> + <link rel="icon" href="/assets/favicon.ico"> + + <title>Learner</title> + + <!-- Bootstrap core CSS --> + <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> + <!-- Bootstrap theme --> + <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> + + <!-- Custom styles for this template --> + <link href="/assets/css/theme.css" rel="stylesheet"> + + <link href="/css/main.css" rel="stylesheet"> + + <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> + <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> + <script src="/assets/js/ie-emulation-modes-warning.js"></script> + + <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> + <!--[if lt IE 9]> + <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> + <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> + <![endif]--> + </head> + + + + <body> + <div class="container"> + <!-- Fixed navbar --> + <nav class="navbar navbar-default navbar-fixed-top" role="navigation"> + <div class="container"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/index.html">Apache SAMOA</a> + </div> + <div id="navbar" class="navbar-collapse collapse"> + <ul class="nav navbar-nav"> + <li><a href="/index.html">Home</a></li> + <li><a href="/documentation/Home.html">Documentation</a></li> + <li><a href="/documentation/Team.html">Contributors</a></li> + <li><a href="/documentation/Bylaws.html">Bylaws</a></li> + </ul> + </div><!--/.nav-collapse --> + </div> + </nav> + + + + + + <!-- Documentation --> +<!-- <div class="container"> --> + + <header class="post-header"> + <h1 class="post-title">Learner</h1> + <p class="post-meta"></p> + </header> + + <article class="post-content"> + <p>Learners are implemented in SAMOA as sub-topologies.</p> +<div class="highlight"><pre><code class="language-text" data-lang="text">public interface Learner extends Serializable{ + + public void init(TopologyBuilder topologyBuilder, Instances dataset); + + public Processor getInputProcessor(); + + public Stream getResultStream(); +} +</code></pre></div> +<p>When a <code>Task</code> object is initiated via <code>init()</code>, the method <code>init(...)</code> of <code>Learner</code> is called, and the topology is added to the global topology of the task.</p> + +<p>To create a new learner, it is only needed to add streams, processors and their connections to the topology in <code>init(...)</code>, specify what is the processor that will manage the input stream of the learner in <code>getInputProcessor()</code>, and finally, specify what is going to be the output stream of the learner with <code>getResultStream()</code>.</p> + + </article> + +<!-- </div> --> + + + + <hr/> +<div id="footer" class="container text-center"> + + <p class="text-muted credit"><p> +Copyright © 2014 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache feather logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> + +</div> + + <!-- Bootstrap core JavaScript + ================================================== --> + <!-- Placed at the end of the document so the pages load faster --> + <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script> + <script src="/assets/js/bootstrap.min.js"></script> + <script src="/assets/js/docs.min.js"></script> + <!-- IE10 viewport hack for Surface/desktop Windows 8 bug --> + <script src="/assets/js/ie10-viewport-bug-workaround.js"></script> + + </div> + + </body> + +</html>
