Modified: 
websites/staging/mahout/trunk/content/users/environment/out-of-core-reference.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/environment/out-of-core-reference.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/environment/out-of-core-reference.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,11 +264,22 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 
id="mahout-samsaras-distributed-linear-algebra-dsl-reference">Mahout-Samsara's 
Distributed Linear Algebra DSL Reference</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 
id="mahout-samsaras-distributed-linear-algebra-dsl-reference">Mahout-Samsara's 
Distributed Linear Algebra DSL Reference<a class="headerlink" 
href="#mahout-samsaras-distributed-linear-algebra-dsl-reference" 
title="Permanent link">&para;</a></h1>
 <p><strong>Note: this page is meant only as a quick reference to 
Mahout-Samsara's R-Like DSL semantics.  For more information, including 
information on Mahout-Samsara's Algebraic Optimizer please see: <a 
href="http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf";>Mahout
 Scala Bindings and Mahout Spark Bindings for Linear Algebra 
Subroutines</a>.</strong></p>
 <p>The subjects of this reference are solely applicable to Mahout-Samsara's 
<strong>DRM</strong> (distributed row matrix).</p>
 <p>In this reference, DRMs will be denoted as e.g. <code>A</code>, and in-core 
matrices as e.g. <code>inCoreA</code>.</p>
-<h4 id="imports">Imports</h4>
+<h4 id="imports">Imports<a class="headerlink" href="#imports" title="Permanent 
link">&para;</a></h4>
 <p>The following imports are used to enable seamless in-core and distributed 
algebraic DSL operations:</p>
 <div class="codehilite"><pre><span class="n">import</span> <span 
class="n">org</span><span class="p">.</span><span class="n">apache</span><span 
class="p">.</span><span class="n">mahout</span><span class="p">.</span><span 
class="n">math</span><span class="p">.</span><span class="n">_</span>
 <span class="n">import</span> <span class="n">scalabindings</span><span 
class="p">.</span><span class="n">_</span>
@@ -289,7 +301,7 @@
 
 
 <p>The Mahout shell does all of these imports automatically.</p>
-<h4 id="drm-persistence-operators">DRM Persistence operators</h4>
+<h4 id="drm-persistence-operators">DRM Persistence operators<a 
class="headerlink" href="#drm-persistence-operators" title="Permanent 
link">&para;</a></h4>
 <p><strong>Mahout-Samsara's DRM persistance to HDFS is compatible with all 
Mahout-MapReduce algorithms such as seq2sparse.</strong></p>
 <p>Loading a DRM from (HD)FS:</p>
 <div class="codehilite"><pre><span class="n">drmDfsRead</span><span 
class="p">(</span><span class="n">path</span> <span class="p">=</span> <span 
class="n">hdfsPath</span><span class="p">)</span>
@@ -325,7 +337,7 @@ val inCoreC: Matrix = inCoreA %*%: drmB
 </pre></div>
 
 
-<h4 id="logical-algebraic-operators-on-drm-matrices">Logical algebraic 
operators on DRM matrices:</h4>
+<h4 id="logical-algebraic-operators-on-drm-matrices">Logical algebraic 
operators on DRM matrices:<a class="headerlink" 
href="#logical-algebraic-operators-on-drm-matrices" title="Permanent 
link">&para;</a></h4>
 <p>A logical set of operators are defined for distributed matrices as a subset 
of those defined for in-core matrices.  In particular, since all distributed 
matrices are immutable, there are no assignment operators (e.g. <strong>A += 
B</strong>)
 <em>Note: please see: <a 
href="http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf";>Mahout
 Scala Bindings and Mahout Spark Bindings for Linear Algebra Subroutines</a> 
for information on Mahout-Samsars's Algebraic Optimizer, and translation from 
logical operations to a physical plan for the back end.</em></p>
 <p>Cache a DRM and trigger an optimized physical plan: </p>
@@ -420,7 +432,7 @@ Elementwise operations of every matrix e
 
 
 <p>Note that <code>5.0 -: A</code> means <code>\(m_{ij} = 5 - a_{ij}\)</code> 
and <code>5.0 /: A</code> means <code>\(m_{ij} = \frac{5}{a{ij}}\)</code> for 
all elements of the result.</p>
-<h4 id="slicing">Slicing</h4>
+<h4 id="slicing">Slicing<a class="headerlink" href="#slicing" title="Permanent 
link">&para;</a></h4>
 <p>General slice:</p>
 <div class="codehilite"><pre><span class="n">A</span><span 
class="p">(</span>100 <span class="n">to</span> 200<span class="p">,</span> 100 
<span class="n">to</span> 200<span class="p">)</span>
 </pre></div>
@@ -437,7 +449,7 @@ Elementwise operations of every matrix e
 
 
 <p><em>Note: if row range is not all-range (::) the the DRM must be 
<code>Int</code>-keyed.  General case row slicing is not supported by DRMs with 
key types other than <code>Int</code></em>.</p>
-<h4 id="stitching">Stitching</h4>
+<h4 id="stitching">Stitching<a class="headerlink" href="#stitching" 
title="Permanent link">&para;</a></h4>
 <p>Stitch side by side (cbind R semantics):</p>
 <div class="codehilite"><pre><span class="n">val</span> <span 
class="n">drmAnextToB</span> <span class="p">=</span> <span 
class="n">drmA</span> <span class="n">cbind</span> <span class="n">drmB</span>
 </pre></div>
@@ -449,7 +461,7 @@ Elementwise operations of every matrix e
 
 
 <p>Analogously, vertical concatenation is available via 
<strong>rbind</strong></p>
-<h4 id="custom-pipelines-on-blocks">Custom pipelines on blocks</h4>
+<h4 id="custom-pipelines-on-blocks">Custom pipelines on blocks<a 
class="headerlink" href="#custom-pipelines-on-blocks" title="Permanent 
link">&para;</a></h4>
 <p>Internally, Mahout-Samsara's DRM is represented as a distributed set of 
vertical (Key, Block) tuples.</p>
 <p><strong>drm.mapBlock(...)</strong>:</p>
 <p>The DRM operator <code>mapBlock</code> provides transformational access to 
the distributed vertical blockified tuples of a matrix (Row-Keys, 
Vertical-Matrix-Block).</p>
@@ -462,7 +474,7 @@ Elementwise operations of every matrix e
 </pre></div>
 
 
-<h4 id="broadcasting-vectors-and-matrices-to-closures">Broadcasting Vectors 
and matrices to closures</h4>
+<h4 id="broadcasting-vectors-and-matrices-to-closures">Broadcasting Vectors 
and matrices to closures<a class="headerlink" 
href="#broadcasting-vectors-and-matrices-to-closures" title="Permanent 
link">&para;</a></h4>
 <p>Generally we can create and use one-way closure attributes to be used on 
the back end.</p>
 <p>Scalar matrix multiplication:</p>
 <div class="codehilite"><pre>val factor: Int = 15
@@ -484,7 +496,7 @@ val drm2 <span class="o">=</span> drm1.m
 </pre></div>
 
 
-<h4 id="computations-providing-ad-hoc-summaries">Computations providing ad-hoc 
summaries</h4>
+<h4 id="computations-providing-ad-hoc-summaries">Computations providing ad-hoc 
summaries<a class="headerlink" href="#computations-providing-ad-hoc-summaries" 
title="Permanent link">&para;</a></h4>
 <p>Matrix cardinality:</p>
 <div class="codehilite"><pre><span class="n">drmA</span><span 
class="p">.</span><span class="n">nrow</span>
 <span class="n">drmA</span><span class="p">.</span><span class="n">ncol</span>
@@ -501,7 +513,7 @@ val drm2 <span class="o">=</span> drm1.m
 
 
 <p><em>Note: These will always trigger a computational action.  I.e. if one 
calls <code>colSums()</code> n times, then the back end will actually recompute 
<code>colSums</code> n times.</em></p>
-<h4 id="distributed-matrix-decompositions">Distributed Matrix 
Decompositions</h4>
+<h4 id="distributed-matrix-decompositions">Distributed Matrix Decompositions<a 
class="headerlink" href="#distributed-matrix-decompositions" title="Permanent 
link">&para;</a></h4>
 <p>To import the decomposition package:</p>
 <div class="codehilite"><pre><span class="n">import</span> <span 
class="n">org</span><span class="p">.</span><span class="n">apache</span><span 
class="p">.</span><span class="n">mahout</span><span class="p">.</span><span 
class="n">math</span><span class="p">.</span><span class="n">_</span>
 <span class="n">import</span> <span class="n">decompositions</span><span 
class="p">.</span><span class="n">_</span>
@@ -532,7 +544,7 @@ val drm2 <span class="o">=</span> drm1.m
 </pre></div>
 
 
-<h4 id="adjusting-parallelism-of-computations">Adjusting parallelism of 
computations</h4>
+<h4 id="adjusting-parallelism-of-computations">Adjusting parallelism of 
computations<a class="headerlink" href="#adjusting-parallelism-of-computations" 
title="Permanent link">&para;</a></h4>
 <p>Set the minimum parallelism to 100 for computations on 
<code>drmA</code>:</p>
 <div class="codehilite"><pre><span class="n">drmA</span><span 
class="p">.</span><span class="n">par</span><span class="p">(</span><span 
class="n">min</span> <span class="p">=</span> 100<span class="p">)</span>
 </pre></div>
@@ -548,7 +560,7 @@ val drm2 <span class="o">=</span> drm1.m
 </pre></div>
 
 
-<h4 
id="retrieving-the-engine-specific-data-structure-backing-the-drm">Retrieving 
the engine specific data structure backing the DRM:</h4>
+<h4 
id="retrieving-the-engine-specific-data-structure-backing-the-drm">Retrieving 
the engine specific data structure backing the DRM:<a class="headerlink" 
href="#retrieving-the-engine-specific-data-structure-backing-the-drm" 
title="Permanent link">&para;</a></h4>
 <p><strong>A Spark RDD:</strong></p>
 <div class="codehilite"><pre><span class="n">val</span> <span 
class="n">myRDD</span> <span class="p">=</span> <span 
class="n">drmA</span><span class="p">.</span><span 
class="n">checkpoint</span><span class="p">().</span><span class="n">rdd</span>
 </pre></div>

Modified: 
websites/staging/mahout/trunk/content/users/environment/spark-internals.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/environment/spark-internals.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/environment/spark-internals.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,14 +264,25 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="introduction">Introduction</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="introduction">Introduction<a class="headerlink" href="#introduction" 
title="Permanent link">&para;</a></h1>
 <p>This document provides an overview of how the Mahout Scala DSL (distributed 
algebraic operators) is implemented over the Spark back end engine. The 
document is aimed at Mahout developers, to give a high level description of the 
design. </p>
-<h2 id="spark-overview">Spark Overview</h2>
-<h2 id="spark-data-model">Spark Data Model</h2>
-<h2 id="mahout-drm">Mahout DRM</h2>
+<h2 id="spark-overview">Spark Overview<a class="headerlink" 
href="#spark-overview" title="Permanent link">&para;</a></h2>
+<h2 id="spark-data-model">Spark Data Model<a class="headerlink" 
href="#spark-data-model" title="Permanent link">&para;</a></h2>
+<h2 id="mahout-drm">Mahout DRM<a class="headerlink" href="#mahout-drm" 
title="Permanent link">&para;</a></h2>
 <p>Mahout DRM, or Distributed Row Matrix, is an abstraction for storing a 
large matrix of numbers in-memory in a cluster by distributing logical rows 
among servers. The DSL provides an abstract API on DRMs for backend engines to 
provide implementations of this API. Examples are Spark and H2O backend 
engines. Each engine has its own design of mapping the abstract API onto its 
data model and provide implementations for algebraic operators over that 
mapping.</p>
-<h2 id="spark-dsl-engine">Spark DSL Engine</h2>
-<h2 id="source-layout">Source Layout</h2>
+<h2 id="spark-dsl-engine">Spark DSL Engine<a class="headerlink" 
href="#spark-dsl-engine" title="Permanent link">&para;</a></h2>
+<h2 id="source-layout">Source Layout<a class="headerlink" 
href="#source-layout" title="Permanent link">&para;</a></h2>
    </div>
   </div>     
 </div> 

Modified: 
websites/staging/mahout/trunk/content/users/flinkbindings/flink-internals.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/flinkbindings/flink-internals.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/flinkbindings/flink-internals.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>

Modified: websites/staging/mahout/trunk/content/users/misc/mr---map-reduce.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/misc/mr---map-reduce.html 
(original)
+++ websites/staging/mahout/trunk/content/users/misc/mr---map-reduce.html Fri 
Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p>{excerpt}MapReduce is a framework for processing huge datasets on 
certain
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p>{excerpt}MapReduce is a framework for processing huge datasets on certain
 kinds of distributable problems using a large number of computers (nodes),
 collectively referred to as a cluster.{excerpt} Computational processing
 can occur on data stored either in a filesystem (unstructured) or within a

Modified: 
websites/staging/mahout/trunk/content/users/misc/parallel-frequent-pattern-mining.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/misc/parallel-frequent-pattern-mining.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/misc/parallel-frequent-pattern-mining.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p>Mahout has a Top K Parallel FPGrowth Implementation. Its based on the 
paper <a 
href="http://infolab.stanford.edu/~echang/recsys08-69.pdf";>http://infolab.stanford.edu/~echang/recsys08-69.pdf</a>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p>Mahout has a Top K Parallel FPGrowth Implementation. Its based on the paper 
<a 
href="http://infolab.stanford.edu/~echang/recsys08-69.pdf";>http://infolab.stanford.edu/~echang/recsys08-69.pdf</a>
  with some optimisations in mining the data.</p>
 <p>Given a huge transaction list, the algorithm finds all unique features(sets
 of field values) and eliminates those features whose frequency in the whole
@@ -311,7 +323,7 @@ class which takes care of storing the ob
 File Output format</li>
 </ul>
 <p><a 
name="ParallelFrequentPatternMining-RunningFrequentPatternGrowthviacommandline"></a></p>
-<h2 id="running-frequent-pattern-growth-via-command-line">Running Frequent 
Pattern Growth via command line</h2>
+<h2 id="running-frequent-pattern-growth-via-command-line">Running Frequent 
Pattern Growth via command line<a class="headerlink" 
href="#running-frequent-pattern-growth-via-command-line" title="Permanent 
link">&para;</a></h2>
 <p>The command line launcher for string transaction data
 org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver has other features including
 specifying the regex pattern for spitting a string line of a transaction
@@ -319,7 +331,7 @@ into the constituent features.</p>
 <p>Input files have to be in the following format.</p>
 <p><optional document id>TAB<TOKEN1>SPACE<TOKEN2>SPACE....</p>
 <p>instead of tab you could use , or \| as the default tokenization is done 
using a java Regex pattern {code}<a href=",\t.html">,\t</a>
-<em>[,|\t][ ,\t]</em>{code}
+<em code="code">[,|\t][ ,\t]</em>
 You can override this parameter to parse your log files or transaction
 files (each line is a transaction.) The FPGrowth algorithm mines the top K
 frequently occurring sets of items and their counts from the given input
@@ -350,7 +362,7 @@ gz file or even a directory containing a
 We modified the regex to use space to split the token. Note that input
 regex string is escaped.</p>
 <p><a name="ParallelFrequentPatternMining-RunningParallelFPGrowth"></a></p>
-<h2 id="running-parallel-fpgrowth">Running Parallel FPGrowth</h2>
+<h2 id="running-parallel-fpgrowth">Running Parallel FPGrowth<a 
class="headerlink" href="#running-parallel-fpgrowth" title="Permanent 
link">&para;</a></h2>
 <p>Running parallel FPGrowth is as easy as adding changing the flag -method
 mapreduce and adding the number of groups parameter e.g. -g 20 for 20
 groups. First, let's run the above sample test in map-reduce mode:</p>
@@ -417,7 +429,7 @@ consumption but might improve speed unti
 entirely on the dataset in question. A value of 5-10 is recommended for
 mining up to top 100 patterns for each feature.</p>
 <p><a name="ParallelFrequentPatternMining-Viewingtheresults"></a></p>
-<h2 id="viewing-the-results">Viewing the results</h2>
+<h2 id="viewing-the-results">Viewing the results<a class="headerlink" 
href="#viewing-the-results" title="Permanent link">&para;</a></h2>
 <p>The output will be dumped to a SequenceFile in the frequentpatterns
 directory in Text=&gt;TopKStringPatterns format. Run this command to see a few
 of the Frequent Patterns:</p>

Modified: 
websites/staging/mahout/trunk/content/users/misc/perceptron-and-winnow.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/misc/perceptron-and-winnow.html 
(original)
+++ websites/staging/mahout/trunk/content/users/misc/perceptron-and-winnow.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a 
name="PerceptronandWinnow-ClassificationwithPerceptronorWinnow"></a></p>
-<h1 id="classification-with-perceptron-or-winnow">Classification with 
Perceptron or Winnow</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="PerceptronandWinnow-ClassificationwithPerceptronorWinnow"></a></p>
+<h1 id="classification-with-perceptron-or-winnow">Classification with 
Perceptron or Winnow<a class="headerlink" 
href="#classification-with-perceptron-or-winnow" title="Permanent 
link">&para;</a></h1>
 <p>Both algorithms are comparably simple linear classifiers. Given training
 data in some n-dimensional vector space that is annotated with binary
 labels the algorithms are guaranteed to find a linear separating hyperplane
@@ -280,12 +292,12 @@ In contrast to Naive Bayes they are not
 features (in the domain of text classification: all terms in a document)
 are independent.</p>
 <p><a name="PerceptronandWinnow-Strategyforparallelisation"></a></p>
-<h2 id="strategy-for-parallelisation">Strategy for parallelisation</h2>
+<h2 id="strategy-for-parallelisation">Strategy for parallelisation<a 
class="headerlink" href="#strategy-for-parallelisation" title="Permanent 
link">&para;</a></h2>
 <p>Currently the strategy for parallelisation is simple: Given there is enough
 training data, split the training data. Train the classifier on each split.
 The resulting hyperplanes are then averaged.</p>
 <p><a name="PerceptronandWinnow-Roadmap"></a></p>
-<h2 id="roadmap">Roadmap</h2>
+<h2 id="roadmap">Roadmap<a class="headerlink" href="#roadmap" title="Permanent 
link">&para;</a></h2>
 <p>Currently the patch only contains the code for the classifier itself. It is
 planned to provide unit tests and at least one example based on the WebKB
 dataset by the end of November for the serial version. After that the

Modified: websites/staging/mahout/trunk/content/users/misc/testing.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/misc/testing.html (original)
+++ websites/staging/mahout/trunk/content/users/misc/testing.html Fri Apr  8 
18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,12 +264,23 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="Testing-Intro"></a></p>
-<h1 id="intro">Intro</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="Testing-Intro"></a></p>
+<h1 id="intro">Intro<a class="headerlink" href="#intro" title="Permanent 
link">&para;</a></h1>
 <p>As Mahout matures, solid testing procedures are needed.  This page and its
 children capture test plans along with ideas for improving our testing.</p>
 <p><a name="Testing-TestPlans"></a></p>
-<h1 id="test-plans">Test Plans</h1>
+<h1 id="test-plans">Test Plans<a class="headerlink" href="#test-plans" 
title="Permanent link">&para;</a></h1>
 <ul>
 <li><a href="0.6.html">0.6</a></li>
 <li>Test Plans for the 0.6 release
@@ -276,9 +288,9 @@ There are no special plans except for un
 Hadoop jobs.</li>
 </ul>
 <p><a name="Testing-TestIdeas"></a></p>
-<h1 id="test-ideas">Test Ideas</h1>
+<h1 id="test-ideas">Test Ideas<a class="headerlink" href="#test-ideas" 
title="Permanent link">&para;</a></h1>
 <p><a name="Testing-Regressions/Benchmarks/Integrations"></a></p>
-<h2 
id="regressionsbenchmarksintegrations">Regressions/Benchmarks/Integrations</h2>
+<h2 
id="regressionsbenchmarksintegrations">Regressions/Benchmarks/Integrations<a 
class="headerlink" href="#regressionsbenchmarksintegrations" title="Permanent 
link">&para;</a></h2>
 <ul>
 <li>Algorithmic quality and speed are not tested, except in a few instances.
 Such tests often require much longer run times (minutes to hours), a
@@ -290,14 +302,14 @@ S3, JDBC, Cassandra, etc. </li>
 <p>Apache Jenkins is not able to support these environments. Commercial
 donations would help. </p>
 <p><a name="Testing-UnitTests"></a></p>
-<h2 id="unit-tests">Unit Tests</h2>
+<h2 id="unit-tests">Unit Tests<a class="headerlink" href="#unit-tests" 
title="Permanent link">&para;</a></h2>
 <p>Mahout's current tests are almost entirely unit tests. Algorithm tests
 generally supply a few numbers to code paths and verify that expected
 numbers come out. 'mvn test' runs these tests. There is "positive" coverage
 of a great many utilities and algorithms. A much smaller percent include
 "negative" coverage (bogus setups, inputs, combinations).</p>
 <p><a name="Testing-Other"></a></p>
-<h2 id="other">Other</h2>
+<h2 id="other">Other<a class="headerlink" href="#other" title="Permanent 
link">&para;</a></h2>
    </div>
   </div>     
 </div> 

Modified: 
websites/staging/mahout/trunk/content/users/misc/using-mahout-with-python-via-jpype.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/misc/using-mahout-with-python-via-jpype.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/misc/using-mahout-with-python-via-jpype.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="UsingMahoutwithPythonviaJPype-overview"></a></p>
-<h1 id="mahout-over-jython-some-examples">Mahout over Jython - some 
examples</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="UsingMahoutwithPythonviaJPype-overview"></a></p>
+<h1 id="mahout-over-jython-some-examples">Mahout over Jython - some examples<a 
class="headerlink" href="#mahout-over-jython-some-examples" title="Permanent 
link">&para;</a></h1>
 <p>This tutorial provides some sample code illustrating how we can read and
 write sequence files containing Mahout vectors from Python using JPype.
 This tutorial is intended for people who want to use Python for analyzing
@@ -299,7 +311,7 @@ python script. The result for me looks l
 
 
 <p><a 
name="UsingMahoutwithPythonviaJPype-WritingNamedVectorstoSequenceFilesfromPython"></a></p>
-<h1 id="writing-named-vectors-to-sequence-files-from-python">Writing Named 
Vectors to Sequence Files from Python</h1>
+<h1 id="writing-named-vectors-to-sequence-files-from-python">Writing Named 
Vectors to Sequence Files from Python<a class="headerlink" 
href="#writing-named-vectors-to-sequence-files-from-python" title="Permanent 
link">&para;</a></h1>
 <p>We can now use JPype to create sequence files which will contain vectors to
 be used by Mahout for kmeans. The example below is a function which creates
 vectors from two Gaussian distributions with unit variance.</p>
@@ -370,7 +382,7 @@ vectors from two Gaussian distributions
 
 
 <p><a 
name="UsingMahoutwithPythonviaJPype-ReadingtheKMeansClusteredPointsfromPython"></a></p>
-<h1 id="reading-the-kmeans-clustered-points-from-python">Reading the KMeans 
Clustered Points from Python</h1>
+<h1 id="reading-the-kmeans-clustered-points-from-python">Reading the KMeans 
Clustered Points from Python<a class="headerlink" 
href="#reading-the-kmeans-clustered-points-from-python" title="Permanent 
link">&para;</a></h1>
 <p>Similarly we can use JPype to easily read the clustered points outputted by
 mahout.</p>
 <div class="codehilite"><pre><span class="n">def</span> <span 
class="n">read_clustered_pts</span><span class="p">(</span><span 
class="n">ifile</span><span class="p">,</span><span class="o">*</span><span 
class="n">args</span><span class="p">,</span><span class="o">**</span><span 
class="n">param</span><span class="p">):</span>
@@ -420,7 +432,7 @@ mahout.</p>
 
 
 <p><a name="UsingMahoutwithPythonviaJPype-ReadingtheKMeansCentroids"></a></p>
-<h1 id="reading-the-kmeans-centroids">Reading the KMeans Centroids</h1>
+<h1 id="reading-the-kmeans-centroids">Reading the KMeans Centroids<a 
class="headerlink" href="#reading-the-kmeans-centroids" title="Permanent 
link">&para;</a></h1>
 <p>Finally we can create a function to print out the actual cluster centers
 found by mahout,</p>
 <div class="codehilite"><pre><span class="n">def</span> <span 
class="n">getClusters</span><span class="p">(</span><span 
class="n">ifile</span><span class="p">,</span><span class="o">*</span><span 
class="n">args</span><span class="p">,</span><span class="o">**</span><span 
class="n">param</span><span class="p">):</span>

Modified: 
websites/staging/mahout/trunk/content/users/recommender/intro-als-hadoop.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/recommender/intro-als-hadoop.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/recommender/intro-als-hadoop.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>

Modified: 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>

Modified: 
websites/staging/mahout/trunk/content/users/recommender/intro-itembased-hadoop.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/recommender/intro-itembased-hadoop.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/recommender/intro-itembased-hadoop.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 
id="introduction-to-item-based-recommendations-with-hadoop">Introduction to 
Item-Based Recommendations with Hadoop</h1>
-<h2 id="overview">Overview</h2>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="introduction-to-item-based-recommendations-with-hadoop">Introduction 
to Item-Based Recommendations with Hadoop<a class="headerlink" 
href="#introduction-to-item-based-recommendations-with-hadoop" title="Permanent 
link">&para;</a></h1>
+<h2 id="overview">Overview<a class="headerlink" href="#overview" 
title="Permanent link">&para;</a></h2>
 <p>Mahout’s item based recommender is a flexible and easily implemented 
algorithm with a diverse range of applications. The minimalism of the primary 
input file’s structure and availability of ancillary filtering controls can 
make sourcing required data and shaping a desired output both efficient and 
straightforward.</p>
 <p>Typical use cases include:</p>
 <ul>
@@ -282,7 +294,7 @@
 <li>Map product substitutions into the Mahout input (i.e. if WidgetA is a 
recommended item replace it with WidgetX)</li>
 </ul>
 <p>The item based recommender output can be easily consumed by downstream 
applications (i.e. websites, ERP systems or salesforce automation tools) and is 
configurable so users can determine the number of item recommendations 
generated by the algorithm.</p>
-<h2 id="example">Example</h2>
+<h2 id="example">Example<a class="headerlink" href="#example" title="Permanent 
link">&para;</a></h2>
 <p>Testing the item based recommender can be a simple and potentially quite 
rewarding endeavor. Whereas the typical sample use case for collaborative 
filtering focuses on utilization of, and integration with, eCommerce platforms 
we can instead look at a potential use case applicable to most businesses (even 
those without a web presence). Let’s look at how a company might use 
Mahout’s item based recommender to identify new sales opportunities for an 
existing customer base. First, you’ll need to get Mahout up and running, the 
instructions for which can be found <a 
href="https://mahout.apache.org/users/basics/quickstart.html";>here</a>. After 
you've ensured Mahout is properly installed, we’re ready to run a quick 
example.</p>
 <p><strong>Step 1: Gather some test data</strong></p>
 <p>Mahout’s item based recommender relies on three key pieces of data: 
<em>userID</em>, <em>itemID</em> and <em>preference</em>. The “users” could 
be website visitors or simply customers that purchase products from your 
business. Similarly, items could be products, product groups or even pages on 
your website – really anything you would want to recommend to a group of 
users or customers. For our example let’s use customer orders as a proxy for 
preference. A simple count of distinct orders by customer, by product will work 
for this example. You’ll find as you explore ways to manipulate the item 
based recommender the preference value can be many things (page clicks, 
explicit ratings, order counts, etc.). Once your test data is gathered put it 
in a <em>.txt</em> file separated by commas with no column headers included.</p>

Modified: 
websites/staging/mahout/trunk/content/users/recommender/matrix-factorization.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/recommender/matrix-factorization.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/recommender/matrix-factorization.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="MatrixFactorization-Intro"></a></p>
-<h1 
id="introduction-to-matrix-factorization-for-recommendation-mining">Introduction
 to Matrix Factorization for Recommendation Mining</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="MatrixFactorization-Intro"></a></p>
+<h1 
id="introduction-to-matrix-factorization-for-recommendation-mining">Introduction
 to Matrix Factorization for Recommendation Mining<a class="headerlink" 
href="#introduction-to-matrix-factorization-for-recommendation-mining" 
title="Permanent link">&para;</a></h1>
 <p>In the mathematical discipline of linear algebra, a matrix decomposition 
 or matrix factorization is a dimensionality reduction technique that 
factorizes a matrix into a product of matrices, usually two. 
 There are many different matrix decompositions, each finds use among a 
particular class of problems.</p>
@@ -297,7 +309,7 @@ So our matrix factorization target could
 </pre></div>
 
 
-<h2 id="sgd">SGD</h2>
+<h2 id="sgd">SGD<a class="headerlink" href="#sgd" title="Permanent 
link">&para;</a></h2>
 <p>Stochastic gradient descent is a gradient descent optimization method for 
minimizing an objective function that is written as a su of differentiable 
functions.</p>
 <div class="codehilite"><pre>   <span class="n">Q</span><span 
class="p">(</span><span class="n">w</span><span class="p">)</span> <span 
class="p">=</span> <span class="n">sum</span><span class="p">(</span><span 
class="n">Q_i</span><span class="p">(</span><span class="n">w</span><span 
class="p">)),</span>
 </pre></div>
@@ -348,7 +360,7 @@ So our matrix factorization target could
 </pre></div>
 
 
-<h2 id="svd">SVD++</h2>
+<h2 id="svd">SVD++<a class="headerlink" href="#svd" title="Permanent 
link">&para;</a></h2>
 <p>SVD++ is an enhancement of the SGD matrix factorization. </p>
 <p>It could be considered as an integration of latent factor model and 
neighborhood based model, considering not only how users rate, but also who has 
rated what. </p>
 <p>The complete model is a sum of 3 sub-models with complete prediction 
formula as follows: </p>
@@ -393,13 +405,13 @@ please refer to the paper <a href="http:
 
 
 <p>where alpha is the learning rate of gradient descent, N(u) is the items 
that user u has expressed preference.</p>
-<h2 id="parallel-sgd">Parallel SGD</h2>
+<h2 id="parallel-sgd">Parallel SGD<a class="headerlink" href="#parallel-sgd" 
title="Permanent link">&para;</a></h2>
 <p>Mahout has a parallel SGD implementation in ParallelSGDFactorizer class. It 
shuffles the user ratings in every iteration and 
 generates splits on the shuffled ratings. Each split is handled by a thread to 
update the user features and item features using 
 vanilla SGD. </p>
 <p>The implementation could be traced back to a lock-free version of SGD based 
on paper 
 <a href="http://www.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf";>Hogwild!: 
A Lock-Free Approach to Parallelizing Stochastic Gradient Descent</a>.</p>
-<h2 id="alswr">ALSWR</h2>
+<h2 id="alswr">ALSWR<a class="headerlink" href="#alswr" title="Permanent 
link">&para;</a></h2>
 <p>ALSWR is an iterative algorithm to solve the low rank factorization of user 
feature matrix U and item feature matrix M.<br />
 The loss function to be minimized is formulated as the sum of squared errors 
plus <a href="http://en.wikipedia.org/wiki/Tikhonov_regularization";>Tikhonov 
regularization</a>:</p>
 <div class="codehilite"><pre> <span class="n">L</span><span 
class="p">(</span><span class="n">R</span><span class="p">,</span> <span 
class="n">U</span><span class="p">,</span> <span class="n">M</span><span 
class="p">)</span> <span class="p">=</span> <span class="n">sum</span><span 
class="p">(</span><span class="n">pow</span><span class="p">((</span><span 
class="n">R</span><span class="p">[</span><span class="n">u</span><span 
class="p">,</span><span class="nb">i</span><span class="p">]</span> <span 
class="o">-</span> <span class="n">U</span><span class="p">[</span><span 
class="n">u</span><span class="p">,]</span><span class="o">*</span> <span 
class="p">(</span><span class="n">M</span><span class="p">[</span><span 
class="nb">i</span><span class="p">,]</span>^<span class="n">t</span><span 
class="p">)),</span> 2<span class="p">))</span> <span class="o">+</span> <span 
class="n">lambda</span> <span class="o">*</span> <span class="p">(</span><span 
class="n">sum</span><span class="p">(</spa
 n><span class="n">n</span><span class="p">(</span><span 
class="n">u</span><span class="p">)</span> <span class="o">*</span> <span 
class="o">||</span><span class="n">U</span><span class="p">[</span><span 
class="n">u</span><span class="p">,]</span><span class="o">||</span>^2<span 
class="p">)</span> <span class="o">+</span> <span class="n">sum</span><span 
class="p">(</span><span class="n">n</span><span class="p">(</span><span 
class="nb">i</span><span class="p">)</span> <span class="o">*</span> <span 
class="o">||</span><span class="n">M</span><span class="p">[</span><span 
class="nb">i</span><span class="p">,]</span><span class="o">||</span>^2<span 
class="p">))</span>
@@ -424,7 +436,7 @@ item and their feature vectors:</p>
 <p>The ALSWRFactorizer class is a non-distributed implementation of ALSWR 
using multi-threading to dispatch the computation among several threads.
 Mahout also offers a <a 
href="https://mahout.apache.org/users/recommender/intro-als-hadoop.html";>parallel
 map-reduce implementation</a>.</p>
 <p><a name="MatrixFactorization-Reference"></a></p>
-<h1 id="reference">Reference:</h1>
+<h1 id="reference">Reference:<a class="headerlink" href="#reference" 
title="Permanent link">&para;</a></h1>
 <p><a 
href="http://en.wikipedia.org/wiki/Stochastic_gradient_descent";>Stochastic 
gradient descent</a></p>
 <p><a 
href="http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08%28submitted%29.pdf";>ALSWR</a></p>
    </div>

Modified: 
websites/staging/mahout/trunk/content/users/recommender/quickstart.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/recommender/quickstart.html 
(original)
+++ websites/staging/mahout/trunk/content/users/recommender/quickstart.html Fri 
Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,14 +264,25 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="recommender-overview">Recommender Overview</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="recommender-overview">Recommender Overview<a class="headerlink" 
href="#recommender-overview" title="Permanent link">&para;</a></h1>
 <p>Recommenders have changed over the years. Mahout contains a long list of 
them, which you can still use. But to get the best  out of our more modern 
aproach we'll need to think of the Recommender as a "model creation" 
component&mdash;supplied by Mahout's new spark-itemsimilarity job, and a 
"serving" component&mdash;supplied by a modern scalable search engine, like 
Solr.</p>
 <p><img alt="image" src="http://i.imgur.com/fliHMBo.png"; /></p>
 <p>To integrate with your application you will collect user interactions 
storing them in a DB and also in a from usable by Mahout. The simplest way to 
do this is to log user interactions to csv files (user-id, item-id). The DB 
should be setup to contain the last n user interactions, which will form part 
of the query for recommendations.</p>
 <p>Mahout's spark-itemsimilarity will create a table of (item-id, 
list-of-similar-items) in csv form. Think of this as an item collection with 
one field containing the item-ids of similar items. Index this with your search 
engine. </p>
 <p>When your application needs recommendations for a specific person, get the 
latest user history of interactions from the DB and query the indicator 
collection with this history. You will get back an ordered list of item-ids. 
These are your recommendations. You may wish to filter out any that the user 
has already seen but that will depend on your use case.</p>
 <p>All ids for users and items are preserved as string tokens and so work as 
an external key in DBs or as doc ids for search engines, they also work as 
tokens for search queries.</p>
-<h2 id="references">References</h2>
+<h2 id="references">References<a class="headerlink" href="#references" 
title="Permanent link">&para;</a></h2>
 <ol>
 <li>A free ebook, which talks about the general idea: <a 
href="https://www.mapr.com/practical-machine-learning";>Practical Machine 
Learning</a></li>
 <li>A slide deck, which talks about mixing actions or other indicators: <a 
href="http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/";>Creating
 a Multimodal Recommender with Mahout and a Search Engine</a></li>
@@ -278,7 +290,7 @@
 and  <a 
href="http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/";>What's
 New in Recommenders: part #2</a></li>
 <li>A post describing the loglikelihood ratio:  <a 
href="http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html";>Surprise
 and Coinsidense</a>  LLR is used to reduce noise in the data while keeping the 
calculations O(n) complexity.</li>
 </ol>
-<h2 id="mahout-model-creation">Mahout Model Creation</h2>
+<h2 id="mahout-model-creation">Mahout Model Creation<a class="headerlink" 
href="#mahout-model-creation" title="Permanent link">&para;</a></h2>
 <p>See the page describing <a 
href="http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html";><em>spark-itemsimilarity</em></a>
 for more details.</p>
    </div>
   </div>     

Modified: 
websites/staging/mahout/trunk/content/users/recommender/recommender-documentation.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/recommender/recommender-documentation.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/recommender/recommender-documentation.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="RecommenderDocumentation-Overview"></a></p>
-<h2 id="overview">Overview</h2>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="RecommenderDocumentation-Overview"></a></p>
+<h2 id="overview">Overview<a class="headerlink" href="#overview" 
title="Permanent link">&para;</a></h2>
 <p><em>This documentation concerns the non-distributed, non-Hadoop-based
 recommender engine / collaborative filtering code inside Mahout. It was
 formerly a separate project called "Taste" and has continued development
@@ -294,19 +306,19 @@ and flexibility.</p>
 these interfaces. These are the pieces from which you will build your own
 recommendation engine. That's it! </p>
 <p><a name="RecommenderDocumentation-Architecture"></a></p>
-<h2 id="architecture">Architecture</h2>
+<h2 id="architecture">Architecture<a class="headerlink" href="#architecture" 
title="Permanent link">&para;</a></h2>
 <p><img alt="doc" src="../../images/taste-architecture.png" /></p>
 <p>This diagram shows the relationship between various Mahout components in a
 user-based recommender. An item-based recommender system is similar except
 that there are no Neighborhood algorithms involved.</p>
 <p><a name="RecommenderDocumentation-Recommender"></a></p>
-<h3 id="recommender">Recommender</h3>
+<h3 id="recommender">Recommender<a class="headerlink" href="#recommender" 
title="Permanent link">&para;</a></h3>
 <p>A Recommender is the core abstraction in Mahout. Given a DataModel, it can
 produce recommendations. Applications will most likely use the
 <strong>GenericUserBasedRecommender</strong> or 
<strong>GenericItemBasedRecommender</strong>,
 possibly decorated by <strong>CachingRecommender</strong>.</p>
 <p><a name="RecommenderDocumentation-DataModel"></a></p>
-<h3 id="datamodel">DataModel</h3>
+<h3 id="datamodel">DataModel<a class="headerlink" href="#datamodel" 
title="Permanent link">&para;</a></h3>
 <p>A <strong>DataModel</strong> is the interface to information about user 
preferences. An
 implementation might draw this data from any source, but a database is the
 most likely source. Be sure to wrap this with a 
<strong>ReloadFromJDBCDataModel</strong> to get good performance! Mahout 
provides <strong>MySQLJDBCDataModel</strong>, for example, to access preference 
data from a database via JDBC and MySQL. Another exists for PostgreSQL. Mahout 
also provides a <strong>FileDataModel</strong>, which is fine for small 
applications.</p>
@@ -324,22 +336,22 @@ users and pages in the context of recomm
 is only a notion of an association, or none, between a user and pages that
 have been visited.</p>
 <p><a name="RecommenderDocumentation-UserSimilarity"></a></p>
-<h3 id="usersimilarity">UserSimilarity</h3>
+<h3 id="usersimilarity">UserSimilarity<a class="headerlink" 
href="#usersimilarity" title="Permanent link">&para;</a></h3>
 <p>A <strong>UserSimilarity</strong> defines a notion of similarity between 
two users. This is
 a crucial part of a recommendation engine. These are attached to a
 <strong>Neighborhood</strong> implementation. <strong>ItemSimilarity</strong> 
is analagous, but find
 similarity between items.</p>
 <p><a name="RecommenderDocumentation-UserNeighborhood"></a></p>
-<h3 id="userneighborhood">UserNeighborhood</h3>
+<h3 id="userneighborhood">UserNeighborhood<a class="headerlink" 
href="#userneighborhood" title="Permanent link">&para;</a></h3>
 <p>In a user-based recommender, recommendations are produced by finding a
 "neighborhood" of similar users near a given user. A 
<strong>UserNeighborhood</strong>
 defines a means of determining that neighborhood &mdash; for example,
 nearest 10 users. Implementations typically need a 
<strong>UserSimilarity</strong> to
 operate.</p>
 <p><a name="RecommenderDocumentation-Examples"></a></p>
-<h2 id="examples">Examples</h2>
+<h2 id="examples">Examples<a class="headerlink" href="#examples" 
title="Permanent link">&para;</a></h2>
 <p><a name="RecommenderDocumentation-User-basedRecommender"></a></p>
-<h3 id="user-based-recommender">User-based Recommender</h3>
+<h3 id="user-based-recommender">User-based Recommender<a class="headerlink" 
href="#user-based-recommender" title="Permanent link">&para;</a></h3>
 <p>User-based recommenders are the "original", conventional style of
 recommender systems. They can produce good recommendations when tweaked
 properly; they are not necessarily the fastest recommender systems and are
@@ -378,7 +390,7 @@ algorithm:</p>
 </pre></div>
 
 
-<h2 id="item-based-recommender">Item-based Recommender</h2>
+<h2 id="item-based-recommender">Item-based Recommender<a class="headerlink" 
href="#item-based-recommender" title="Permanent link">&para;</a></h2>
 <p>We could have created an item-based recommender instead. Item-based
 recommenders base recommendation not on user similarity, but on item
 similarity. In theory these are about the same approach to the problem,
@@ -416,14 +428,14 @@ application, you would feed a list of pr
 
 
 <p><a name="RecommenderDocumentation-Integrationwithyourapplication"></a></p>
-<h2 id="integration-with-your-application">Integration with your 
application</h2>
+<h2 id="integration-with-your-application">Integration with your application<a 
class="headerlink" href="#integration-with-your-application" title="Permanent 
link">&para;</a></h2>
 <p>You can create a Recommender, as shown above, wherever you like in your
 Java application, and use it. This includes simple Java applications or GUI
 applications, server applications, and J2EE web applications.</p>
 <p><a name="RecommenderDocumentation-Performance"></a></p>
-<h2 id="performance">Performance</h2>
+<h2 id="performance">Performance<a class="headerlink" href="#performance" 
title="Permanent link">&para;</a></h2>
 <p><a name="RecommenderDocumentation-RuntimePerformance"></a></p>
-<h3 id="runtime-performance">Runtime Performance</h3>
+<h3 id="runtime-performance">Runtime Performance<a class="headerlink" 
href="#runtime-performance" title="Permanent link">&para;</a></h3>
 <p>The more data you give, the better. Though Mahout is designed for
 performance, you will undoubtedly run into performance issues at some
 point. For best results, consider using the following command-line flags to
@@ -454,7 +466,7 @@ code and third-party code you use doesn'
 <li>When using <strong>JDBCDataModel</strong>, make sure you wrap it with the 
<strong>ReloadFromJDBCDataModel</strong> to load data into memory!. </li>
 </ul>
 <p><a 
name="RecommenderDocumentation-AlgorithmPerformance:WhichOneIsBest?"></a></p>
-<h3 id="algorithm-performance-which-one-is-best">Algorithm Performance: Which 
One Is Best?</h3>
+<h3 id="algorithm-performance-which-one-is-best">Algorithm Performance: Which 
One Is Best?<a class="headerlink" 
href="#algorithm-performance-which-one-is-best" title="Permanent 
link">&para;</a></h3>
 <p>There is no right answer; it depends on your data, your application,
 environment, and performance needs. Mahout provides the building blocks
 from which you can construct the best Recommender for your application. The
@@ -481,7 +493,7 @@ not make sense. In this case, try a <em>
 traditional information retrieval figures like precision and recall, which
 are more meaningful.</p>
 <p><a name="RecommenderDocumentation-UsefulLinks"></a></p>
-<h2 id="useful-links">Useful Links</h2>
+<h2 id="useful-links">Useful Links<a class="headerlink" href="#useful-links" 
title="Permanent link">&para;</a></h2>
 <p>Here's a handful of research papers that I've read and found particularly
 useful:</p>
 <p>J.S. Breese, D. Heckerman and C. Kadie, "<a 
href="http://research.microsoft.com/research/pubs/view.aspx?tr_id=166";>Empirical
 Analysis of Predictive Algorithms for Collaborative Filtering</a>

Modified: 
websites/staging/mahout/trunk/content/users/recommender/recommender-first-timer-faq.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/recommender/recommender-first-timer-faq.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/recommender/recommender-first-timer-faq.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="recommender-first-timer-dos-and-donts">Recommender First Timer Dos 
and Don'ts</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="recommender-first-timer-dos-and-donts">Recommender First Timer Dos and 
Don'ts<a class="headerlink" href="#recommender-first-timer-dos-and-donts" 
title="Permanent link">&para;</a></h1>
 <p>Many people with an interest in recommenders arrive at Mahout since they're
 building a first recommender system. Some starting questions have been
 asked enough times to warrant a FAQ collecting advice and rules-of-thumb to

Modified: 
websites/staging/mahout/trunk/content/users/recommender/userbased-5-minutes.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/recommender/userbased-5-minutes.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/recommender/userbased-5-minutes.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,10 +264,21 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="creating-a-user-based-recommender-in-5-minutes">Creating a 
User-Based Recommender in 5 minutes</h1>
-<h2 id="prerequisites">Prerequisites</h2>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="creating-a-user-based-recommender-in-5-minutes">Creating a User-Based 
Recommender in 5 minutes<a class="headerlink" 
href="#creating-a-user-based-recommender-in-5-minutes" title="Permanent 
link">&para;</a></h1>
+<h2 id="prerequisites">Prerequisites<a class="headerlink" 
href="#prerequisites" title="Permanent link">&para;</a></h2>
 <p>Create a java project in your favorite IDE and make sure mahout is on the 
classpath. The easiest way to accomplish this is by importing it via maven as 
described on the <a href="/users/basics/quickstart.html">Quickstart</a> 
page.</p>
-<h2 id="dataset">Dataset</h2>
+<h2 id="dataset">Dataset<a class="headerlink" href="#dataset" title="Permanent 
link">&para;</a></h2>
 <p>Mahout's recommenders expect interactions between users and items as input. 
The easiest way to supply such data to Mahout is in the form of a textfile, 
where every line has the format <em>userID,itemID,value</em>. Here 
<em>userID</em> and <em>itemID</em> refer to a particular user and a particular 
item, and <em>value</em> denotes the strength of the interaction (e.g. the 
rating given to a movie).</p>
 <p>In this example, we'll use some made up data for simplicity. Create a file 
called "dataset.csv" and copy the following example interactions into the file. 
</p>
 <pre>
@@ -304,7 +316,7 @@
 4,18,1.0
 </pre>
 
-<h2 id="creating-a-user-based-recommender">Creating a user-based 
recommender</h2>
+<h2 id="creating-a-user-based-recommender">Creating a user-based recommender<a 
class="headerlink" href="#creating-a-user-based-recommender" title="Permanent 
link">&para;</a></h2>
 <p>Create a class called <em>SampleRecommender</em> with a main method.</p>
 <p>The first thing we have to do is load the data from the file. Mahout's 
recommenders use an interface called <em>DataModel</em> to handle interaction 
data. You can load our made up interactions like this:</p>
 <pre>
@@ -333,7 +345,7 @@ for (RecommendedItem recommendation : re
 </pre>
 
 <p>Congratulations, you have built your first recommender!</p>
-<h2 id="evaluation">Evaluation</h2>
+<h2 id="evaluation">Evaluation<a class="headerlink" href="#evaluation" 
title="Permanent link">&para;</a></h2>
 <p>You might ask yourself, how to make sure that your recommender returns good 
results. Unfortunately, the only way to be really sure about the quality is by 
doing an A/B test with real users in a live system.</p>
 <p>We can however try to get a feel of the quality, by statistical offline 
evaluation. Just keep in mind that this does not replace a test with real 
users!</p>
 <p>One way to check whether the recommender returns good results is by doing a 
<strong>hold-out</strong> test. We partition our dataset into two sets: a 
trainingset consisting of 90% of the data and a testset consisting of 10%. Then 
we train our recommender using the training set and look how well it predicts 
the unknown interactions in the testset.</p>

Modified: websites/staging/mahout/trunk/content/users/sparkbindings/faq.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/sparkbindings/faq.html 
(original)
+++ websites/staging/mahout/trunk/content/users/sparkbindings/faq.html Fri Apr  
8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="faq-for-using-mahout-with-spark">FAQ for using Mahout with 
Spark</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="faq-for-using-mahout-with-spark">FAQ for using Mahout with Spark<a 
class="headerlink" href="#faq-for-using-mahout-with-spark" title="Permanent 
link">&para;</a></h1>
 <p><strong>Q: Mahout Spark shell doesn't start; "ClassNotFound" problems or 
various classpath problems.</strong></p>
 <p><strong>A:</strong> So far as of the time of this writing all reported 
problems starting the Spark shell in Mahout were revolving 
 around classpath issues one way or another. </p>

Modified: websites/staging/mahout/trunk/content/users/sparkbindings/home.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/sparkbindings/home.html 
(original)
+++ websites/staging/mahout/trunk/content/users/sparkbindings/home.html Fri Apr 
 8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>

Modified: 
websites/staging/mahout/trunk/content/users/sparkbindings/play-with-shell.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/sparkbindings/play-with-shell.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/sparkbindings/play-with-shell.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,12 +264,23 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="playing-with-mahouts-spark-shell">Playing with Mahout's Spark 
Shell</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="playing-with-mahouts-spark-shell">Playing with Mahout's Spark Shell<a 
class="headerlink" href="#playing-with-mahouts-spark-shell" title="Permanent 
link">&para;</a></h1>
 <p>This tutorial will show you how to play with Mahout's scala DSL for linear 
algebra and its Spark shell. <strong>Please keep in mind that this code is 
still in a very early experimental stage</strong>.</p>
 <p><em>(Edited for 0.10.2)</em></p>
-<h2 id="intro">Intro</h2>
+<h2 id="intro">Intro<a class="headerlink" href="#intro" title="Permanent 
link">&para;</a></h2>
 <p>We'll use an excerpt of a publicly available <a 
href="http://lib.stat.cmu.edu/DASL/Datafiles/Cereals.html";>dataset about 
cereals</a>. The dataset tells the protein, fat, carbohydrate and sugars (in 
milligrams) contained in a set of cereals, as well as a customer rating for the 
cereals. Our aim for this example is to fit a linear model which infers the 
customer rating from the ingredients.</p>
-<table>
+<table class="table">
 <thead>
 <tr>
 <th align="left">Name</th>
@@ -354,7 +366,7 @@
 </tr>
 </tbody>
 </table>
-<h2 id="installing-mahout-spark-on-your-local-machine">Installing Mahout &amp; 
Spark on your local machine</h2>
+<h2 id="installing-mahout-spark-on-your-local-machine">Installing Mahout &amp; 
Spark on your local machine<a class="headerlink" 
href="#installing-mahout-spark-on-your-local-machine" title="Permanent 
link">&para;</a></h2>
 <p>We describe how to do a quick toy setup of Spark &amp; Mahout on your local 
machine, so that you can run this example and play with the shell. </p>
 <ol>
 <li>Download <a 
href="http://www.apache.org/dyn/closer.cgi/spark/spark-1.1.1/spark-1.1.1.tgz";>Apache
 Spark 1.1.1</a> and unpack the archive file</li>
@@ -362,7 +374,7 @@
 <li>Create a directory for Mahout somewhere on your machine, change to there 
and checkout the master branch of Apache Mahout from GitHub <code>git clone 
https://github.com/apache/mahout mahout</code></li>
 <li>Change to the <code>mahout</code> directory and build mahout using 
<code>mvn -DskipTests clean install</code></li>
 </ol>
-<h2 id="starting-mahouts-spark-shell">Starting Mahout's Spark shell</h2>
+<h2 id="starting-mahouts-spark-shell">Starting Mahout's Spark shell<a 
class="headerlink" href="#starting-mahouts-spark-shell" title="Permanent 
link">&para;</a></h2>
 <ol>
 <li>Goto the directory where you unpacked Spark and type 
<code>sbin/start-all.sh</code> to locally start Spark</li>
 <li>Open a browser, point it to <a 
href="http://localhost:8080/";>http://localhost:8080/</a> to check whether Spark 
successfully started. Copy the url of the spark master at the top of the page 
(it starts with <strong>spark://</strong>)</li>
@@ -374,7 +386,7 @@ export MASTER=[url of the Spark master]
 you should see the shell starting and get the prompt <code>mahout&gt;</code>. 
Check 
 <a href="http://mahout.apache.org/users/sparkbindings/faq.html";>FAQ</a> for 
further troubleshooting.</li>
 </ol>
-<h2 id="implementation">Implementation</h2>
+<h2 id="implementation">Implementation<a class="headerlink" 
href="#implementation" title="Permanent link">&para;</a></h2>
 <p>We'll use the shell to interactively play with the data and incrementally 
implement a simple <a 
href="https://en.wikipedia.org/wiki/Linear_regression";>linear regression</a> 
algorithm. Let's first load the dataset. Usually, we wouldn't need Mahout 
unless we processed a large dataset stored in a distributed filesystem. But for 
the sake of this example, we'll use our tiny toy dataset and "pretend" it was 
too big to fit onto a single machine.</p>
 <p><em>Note: You can incrementally follow the example by copy-and-pasting the 
code into your running Mahout shell.</em></p>
 <p>Mahout's linear algebra DSL has an abstraction called 
<em>DistributedRowMatrix (DRM)</em> which models a matrix that is partitioned 
by rows and stored in the memory of a cluster of machines. We use 
<code>dense()</code> to create a dense in-memory matrix from our toy dataset 
and use <code>drmParallelize</code> to load it into the cluster, "mimicking" a 
large, partitioned dataset.</p>


Reply via email to