Author: gdfm
Date: Sun Jan 31 12:33:14 2016
New Revision: 1727804

URL: http://svn.apache.org/viewvc?rev=1727804&view=rev
Log:
SAMOA-47: Avro documentation (missing file)

Added:
    
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html

Added: 
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html
URL: 
http://svn.apache.org/viewvc/incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html?rev=1727804&view=auto
==============================================================================
--- 
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html 
(added)
+++ 
incubator/samoa/site/documentation/Executing-SAMOA-with-Apache-Avro-Files.html 
Sun Jan 31 12:33:14 2016
@@ -0,0 +1,184 @@
+<!DOCTYPE html>
+<html>
+
+    <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="description" content="">
+    <meta name="author" content="">
+    <link rel="icon" href="/assets/favicon.ico">
+
+    <title>Executing Apache SAMOA with Apache Avro Files</title>
+
+    <!-- Bootstrap core CSS -->
+    <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
+    <!-- Bootstrap theme -->
+    <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
+
+    <!-- Custom styles for this template -->
+    <link href="/assets/css/theme.css" rel="stylesheet">
+       
+       <link href="/css/main.css" rel="stylesheet">
+
+    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
+    <!--[if lt IE 9]><script 
src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
+    <script src="/assets/js/ie-emulation-modes-warning.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!--[if lt IE 9]>
+      <script 
src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js";></script>
+      <script 
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+  </head>
+
+
+
+  <body>
+    <div class="container">
+        <!-- Fixed navbar -->
+    <nav class="navbar navbar-default navbar-fixed-top" role="navigation">
+      <div class="container">
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#navbar" aria-expanded="false" 
aria-controls="navbar">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a class="navbar-brand" href="/index.html">Apache SAMOA</a>
+        </div>
+        <div id="navbar" class="navbar-collapse collapse">
+          <ul class="nav navbar-nav">
+            <li><a href="/index.html">Home</a></li>    
+                       <li><a href="Home.html">Documentation</a></li>
+                       <li><a href="api/current/index.html">API</a></li>
+                       <li><a href="Team.html">Contributors</a></li>
+                       <li><a href="Bylaws.html">Bylaws</a></li>
+          </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+
+
+
+
+      
+        <!-- Documentation -->
+<!-- <div class="container"> -->
+
+  <header class="post-header">
+    <h1 class="post-title">Executing Apache SAMOA with Apache Avro Files</h1>
+    <p class="post-meta"></p>
+  </header>
+
+  <article class="post-content">
+    <p>In this tutorial page we describe how to execute SAMOA with data files 
in Apache Avro file format. Here is an outline of this tutorial</p>
+
+<ol>
+<li>Overview of Apache Avro</li>
+<li>Avro Input Format for SAMOA</li>
+<li>SAMOA task execution with Avro</li>
+<li>Sample Avro Data for SAMOA</li>
+</ol>
+
+<h3 id="overview-of-apache-avro">Overview of Apache Avro</h3>
+
+<p>Users of Apache SAMOA can now use Binary/JSON encoded Avro data as an 
alternate to the default ARFF file format as the data source. Avro is a remote 
procedure call and data serialization framework developed within Apache&#39;s 
Hadoop project. It uses JSON for defining data types and protocols, and 
serializes data in a compact binary format. Avro specifies two serialization 
encodings for the data: Binary and JSON, default being Binary. However the 
meta-data is always in JSON. Avro data is always serialized with its schema. 
Files that store Avro data should also include the schema for that data in the 
same file. </p>
+
+<p>You can find the latest Apache Avro documentation <a 
href="https://avro.apache.org/docs/current/";>here</a> for more details.</p>
+
+<h3 id="avro-input-format-for-samoa">Avro Input Format for SAMOA</h3>
+
+<p>It is required that the input Avro files to the SAMOA framework follow 
certain Input Format Rules to seamlessly work with the SAMOA Instances. The 
first line of Avro Source file for SAMOA (irrespective of whether data is 
encoded in binary or JSON) will be the metadata (schema). The data would be by 
default one record per line following the schema and will be mapped into 1 
SAMOA instance per record.</p>
+
+<ol>
+<li>Avro Primitive Types &amp; Enums are allowed for the data as is. </li>
+<li>Avro Complex-types (e.g maps/arrays) may not be used with the exception of 
enum &amp; union. I.e. no sub-structure will be allowed.</li>
+<li>Label (if any) would be the last attribute.</li>
+<li>Timestamps are not supported as of now within SAMOA.</li>
+<li>Avro Enums may be used to represent nominal attributes.</li>
+<li>Avro unions may be used to represent nullability of value. However unions 
may not be used for different data types.<br></li>
+</ol>
+<div class="highlight"><pre><code class="language-" data-lang="">E.g  Enums  
+{"name":"species","type":{"type":"enum","name":"Labels","symbols":["setosa","versicolor","virginica"]}}
  
+E.g  Unions  
+{"name":"attribute1","type":["null","int"]}  -Allowed to denote that value for 
attribute1 is optional  
+{"name":" attribute2","type":["string","int"]}  -Not allowed  
+</code></pre></div>
+<h3 id="samoa-task-execution-with-avro">SAMOA task execution with Avro</h3>
+
+<p>You may execute a SAMOA task using the aforementioned 
<code>bin/samoa</code> script with the following format: <code>bin/samoa 
&lt;platform&gt; &lt;jar&gt; &quot;&lt;task&gt;&quot;</code>.
+Follow this <a href="Executing-SAMOA-with-Apache-S4">link</a>  and this <a 
href="Executing-SAMOA-with-Apache-Storm">link</a> to learn more about deploying 
SAMOA on Apache S4 and Apache Storm respectively. The Avro files can be used as 
data sources for any of the aforementioned platforms. The only addition that 
needs to be made in the commands is as follows:  <code>AvroFileStream 
&lt;file_name&gt; -e &lt;file_format&gt;</code> . Examples are given below for 
different modes. Though the examples below use <a 
href="Prequential-Evaluation-Task">Prequential Evaluation task</a> the commands 
are applicable to all other tasks as well.</p>
+
+<h4 id="local-avro-json">Local - Avro JSON</h4>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation 
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_json.avro -e 
json) -f 100000"
+</code></pre></div>
+<h4 id="local-avro-binary">Local - Avro Binary</h4>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation 
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_binary.avro 
-e binary) -f 100000"
+</code></pre></div>
+<h4 id="storm-avro-json">Storm - Avro JSON</h4>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation 
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_json.avro -e 
json) -f 100000"
+</code></pre></div>
+<h4 id="storm-avro-binary">Storm - Avro Binary</h4>
+<div class="highlight"><pre><code class="language-" data-lang="">bin/samoa 
storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation 
-l classifiers.ensemble.Bagging -s (AvroFileStream -f covtypeNorm_binary.avro 
-e binary) -f 100000"
+</code></pre></div>
+<h3 id="sample-avro-data-for-samoa">Sample Avro Data for SAMOA</h3>
+
+<p>The samples below describe how the default ARFF file formats may be 
converted to JSON/Binary encoded Avro formats.</p>
+
+<h4 id="iris-dataset-default-arff-format">Iris Dataset - Default ARFF 
Format</h4>
+<div class="highlight"><pre><code class="language-" data-lang="">@RELATION 
iris  
+@ATTRIBUTE sepallength  NUMERIC  
+@ATTRIBUTE sepalwidth   NUMERIC     
+@ATTRIBUTE petallength  NUMERIC     
+@ATTRIBUTE petalwidth   NUMERIC     
+@ATTRIBUTE class  {setosa,versicolor,virginica}    
+@DATA  
+5.1,3.5,1.4,0.2,setosa     
+4.9,3.0,1.4,0.2,virginica      
+4.7,3.2,1.3,0.2,virginica     
+4.6,3.1,1.5,0.2,setosa  
+</code></pre></div>
+<h4 id="iris-dataset-json-encoded-avro-format">Iris Dataset - JSON Encoded 
AVRO Format</h4>
+<div class="highlight"><pre><code class="language-" data-lang=""><span 
class="p">{</span><span class="nt">"type"</span><span class="p">:</span><span 
class="s2">"record"</span><span class="p">,</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"Iris"</span><span class="p">,</span><span 
class="nt">"namespace"</span><span class="p">:</span><span 
class="s2">"com.yahoo.labs.samoa.avro.iris"</span><span class="p">,</span><span 
class="nt">"fields"</span><span class="p">:[{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"sepallength"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"sepalwidth"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span clas
 s="p">:</span><span class="s2">"petallength"</span><span 
class="p">,</span><span class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"petalwidth"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"double"</span><span class="p">},{</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"class"</span><span class="p">,</span><span 
class="nt">"type"</span><span class="p">:{</span><span 
class="nt">"type"</span><span class="p">:</span><span 
class="s2">"enum"</span><span class="p">,</span><span 
class="nt">"name"</span><span class="p">:</span><span 
class="s2">"Labels"</span><span class="p">,</span><span 
class="nt">"symbols"</span><span class="p">:[</span><span 
class="s2">"setosa"</span><span class="p">,</span><span 
class="s2">"versicolor"</span><span class="p">,</span><span 
class="s2">"virginica"
 </span><span class="p">]}}]}</span><span class="w">  
+</span><span class="p">{</span><span class="nt">"sepallength"</span><span 
class="p">:</span><span class="mf">5.1</span><span class="p">,</span><span 
class="nt">"sepalwidth"</span><span class="p">:</span><span 
class="mf">3.5</span><span class="p">,</span><span 
class="nt">"petallength"</span><span class="p">:</span><span 
class="mf">1.4</span><span class="p">,</span><span 
class="nt">"petalwidth"</span><span class="p">:</span><span 
class="mf">0.2</span><span class="p">,</span><span 
class="nt">"class"</span><span class="p">:</span><span 
class="s2">"setosa"</span><span class="p">}</span><span class="w">  
+</span><span class="p">{</span><span class="nt">"sepallength"</span><span 
class="p">:</span><span class="mf">3.0</span><span class="p">,</span><span 
class="nt">"sepalwidth"</span><span class="p">:</span><span 
class="mf">1.4</span><span class="p">,</span><span 
class="nt">"petallength"</span><span class="p">:</span><span 
class="mf">4.9</span><span class="p">,</span><span 
class="nt">"petalwidth"</span><span class="p">:</span><span 
class="mf">0.2</span><span class="p">,</span><span 
class="nt">"class"</span><span class="p">:</span><span 
class="s2">"virginica"</span><span class="p">}</span><span class="w">  
+</span><span class="p">{</span><span class="nt">"sepallength"</span><span 
class="p">:</span><span class="mf">4.7</span><span class="p">,</span><span 
class="nt">"sepalwidth"</span><span class="p">:</span><span 
class="mf">3.2</span><span class="p">,</span><span 
class="nt">"petallength"</span><span class="p">:</span><span 
class="mf">1.3</span><span class="p">,</span><span 
class="nt">"petalwidth"</span><span class="p">:</span><span 
class="mf">0.2</span><span class="p">,</span><span 
class="nt">"class"</span><span class="p">:</span><span 
class="s2">"virginica"</span><span class="p">}</span><span class="w">  
+</span><span class="p">{</span><span class="nt">"sepallength"</span><span 
class="p">:</span><span class="mf">3.1</span><span class="p">,</span><span 
class="nt">"sepalwidth"</span><span class="p">:</span><span 
class="mf">1.5</span><span class="p">,</span><span 
class="nt">"petallength"</span><span class="p">:</span><span 
class="mf">4.6</span><span class="p">,</span><span 
class="nt">"petalwidth"</span><span class="p">:</span><span 
class="mf">0.2</span><span class="p">,</span><span 
class="nt">"class"</span><span class="p">:</span><span 
class="s2">"setosa"</span><span class="p">}</span><span class="w">  
+</span></code></pre></div>
+<h4 id="iris-dataset-binary-encoded-avro-format">Iris Dataset - Binary Encoded 
AVRO Format</h4>
+<div class="highlight"><pre><code class="language-" 
data-lang="">Objavro.schema΅
{"type":"record","name":"Iris","namespace":"com.yahoo.labs.samoa.avro.iris","fields":[{"name":"sepallength","type":"double"},{"name":"sepalwidth","type":"double"},{"name":"petallength","type":"double"},{"name":"petalwidth","type":"double"},{"name":"class","type":{"type":"enum","name":"Labels","symbols":["setosa","versicolor","virginica"]}}]}
 !&lt;khCrֱS빧ީȂffffff@      @ffffffٙٙɿ       
@ffffffٙٙ@ڙٙٙɿΌ͌͌@ڙٙٙ  @Ό͌͌ٙٙɿΌ͌͌@      𿦦ffff@ڙٙٙɿ 
!&lt;khCrÖ±Së¹§Þ©
+</code></pre></div>
+<h4 id="forest-covertype-dataset">Forest CoverType Dataset</h4>
+
+<p>The JSON &amp; Binary encoded AVRO Files covtypeNorm_json.avro &amp; 
covtypeNorm_binary.avro for the Forest CoverType dataset can be found at <a 
href="https://cwiki.apache.org/confluence/display/SAMOA/SAMOA+Home";>Wiki</a> 
</p>
+
+  </article>
+
+<!-- </div> -->
+
+      
+
+    <hr/>
+<div id="footer" class="container text-center">
+       
+            <p class="text-muted credit"><p>
+Copyright © 2014 <a href="http://www.apache.org";>Apache Software 
Foundation</a>. All Rights Reserved. Apache SAMOA, Apache, and the Apache 
feather logo  are trademarks of The Apache Software Foundation. All other marks 
mentioned may be trademarks or registered trademarks of their respective 
owners.</p>
+
+</div>
+
+    <!-- Bootstrap core JavaScript
+    ================================================== -->
+    <!-- Placed at the end of the document so the pages load faster -->
+    <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js";></script>
+    <script src="/assets/js/bootstrap.min.js"></script>
+    <script src="/assets/js/docs.min.js"></script>
+    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
+    <script src="/assets/js/ie10-viewport-bug-workaround.js"></script>
+
+    </div>
+       
+  </body>
+
+</html>


Reply via email to