user...

buildbot Fri, 08 Apr 2016 11:41:54 -0700

Modified: 
websites/staging/mahout/trunk/content/users/classification/support-vector-machines.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/classification/support-vector-machines.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/classification/support-vector-machines.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="SupportVectorMachines-SupportVectorMachines"></a></p>
-<h1 id="support-vector-machines">Support Vector Machines</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="SupportVectorMachines-SupportVectorMachines"></a></p>
+<h1 id="support-vector-machines">Support Vector Machines<a class="headerlink" 
href="#support-vector-machines" title="Permanent link">&para;</a></h1>
 <p>As with Naive Bayes, Support Vector Machines (or SVMs in short) can be used
 to solve the task of assigning objects to classes. However, the way this
 task is solved is completely different to the setting in Naive Bayes.</p>
@@ -291,9 +303,9 @@ solutions. Each separating hyperplane ne
 training examples. In addition, that way, the solution may be based on the
 information encoded in only very few examples.</p>
 <p><a name="SupportVectorMachines-Strategyforparallelization"></a></p>
-<h2 id="strategy-for-parallelization">Strategy for parallelization</h2>
+<h2 id="strategy-for-parallelization">Strategy for parallelization<a 
class="headerlink" href="#strategy-for-parallelization" title="Permanent 
link">&para;</a></h2>
 <p><a name="SupportVectorMachines-Designofpackages"></a></p>
-<h2 id="design-of-packages">Design of packages</h2>
+<h2 id="design-of-packages">Design of packages<a class="headerlink" 
href="#design-of-packages" title="Permanent link">&para;</a></h2>
    </div>
   </div>     
 </div>


Modified: 
websites/staging/mahout/trunk/content/users/classification/twenty-newsgroups.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/classification/twenty-newsgroups.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/classification/twenty-newsgroups.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,10 +264,21 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a 
name="TwentyNewsgroups-TwentyNewsgroupsClassificationExample"></a></p>
-<h2 id="twenty-newsgroups-classification-example">Twenty Newsgroups 
Classification Example</h2>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="TwentyNewsgroups-TwentyNewsgroupsClassificationExample"></a></p>
+<h2 id="twenty-newsgroups-classification-example">Twenty Newsgroups 
Classification Example<a class="headerlink" 
href="#twenty-newsgroups-classification-example" title="Permanent 
link">&para;</a></h2>
 <p><a name="TwentyNewsgroups-Introduction"></a></p>
-<h2 id="introduction">Introduction</h2>
+<h2 id="introduction">Introduction<a class="headerlink" href="#introduction" 
title="Permanent link">&para;</a></h2>
 <p>The 20 newsgroups dataset is a collection of approximately 20,000
 newsgroup documents, partitioned (nearly) evenly across 20 different
 newsgroups. The 20 newsgroups collection has become a popular data set for
@@ -275,7 +287,7 @@ text classification and text clustering.
 classifier to create a model that would classify a new document into one of
 the 20 newsgroups.</p>
 <p><a name="TwentyNewsgroups-Prerequisites"></a></p>
-<h3 id="prerequisites">Prerequisites</h3>
+<h3 id="prerequisites">Prerequisites<a class="headerlink" 
href="#prerequisites" title="Permanent link">&para;</a></h3>
 <ul>
 <li>Mahout has been downloaded (<a 
href="https://mahout.apache.org/general/downloads.html";>instructions 
here</a>)</li>
 <li>Maven is available</li>
@@ -286,7 +298,7 @@ the 20 newsgroups.</p>
 </li>
 </ul>
 <p><a name="TwentyNewsgroups-Instructionsforrunningtheexample"></a></p>
-<h3 id="instructions-for-running-the-example">Instructions for running the 
example</h3>
+<h3 id="instructions-for-running-the-example">Instructions for running the 
example<a class="headerlink" href="#instructions-for-running-the-example" 
title="Permanent link">&para;</a></h3>
 <ol>
 <li>
 <p>If running Hadoop in cluster mode, start the hadoop daemons by executing 
the following commands:</p>
@@ -372,7 +384,7 @@ Reliability <span class="p">(</span>stan
 
 
 <p><a name="TwentyNewsgroups-ComplementaryNaiveBayes"></a></p>
-<h2 id="end-to-end-commands-to-build-a-cbayes-model-for-20-newsgroups">End to 
end commands to build a CBayes model for 20 newsgroups</h2>
+<h2 id="end-to-end-commands-to-build-a-cbayes-model-for-20-newsgroups">End to 
end commands to build a CBayes model for 20 newsgroups<a class="headerlink" 
href="#end-to-end-commands-to-build-a-cbayes-model-for-20-newsgroups" 
title="Permanent link">&para;</a></h2>
 <p>The <a 
href="https://github.com/apache/mahout/blob/master/examples/bin/classify-20newsgroups.sh";>20
 newsgroups example script</a> issues the following commands as outlined above. 
We can build a CBayes classifier from the command line by following the process 
in the script: </p>
 <p><em>Be sure that <strong>MAHOUT_HOME</strong>/bin and 
<strong>HADOOP_HOME</strong>/bin are in your <strong>$PATH</strong></em></p>
 <ol>
@@ -396,9 +408,7 @@ Reliability <span class="p">(</span>stan
 
 
 <ul>
-<li>
-<p>If you're running on a Hadoop cluster:</p>
-<div class="codehilite"><pre>$ hadoop dfs -put <span class="cp">${</span><span 
class="n">WORK_DIR</span><span class="cp">}</span>/20news-all <span 
class="cp">${</span><span class="n">WORK_DIR</span><span 
class="cp">}</span>/20news-all
+<li>If you're running on a Hadoop cluster:<div class="codehilite"><pre>$ 
hadoop dfs -put <span class="cp">${</span><span class="n">WORK_DIR</span><span 
class="cp">}</span>/20news-all <span class="cp">${</span><span 
class="n">WORK_DIR</span><span class="cp">}</span>/20news-all
 </pre></div>
 
 

Modified: 
websites/staging/mahout/trunk/content/users/classification/wikipedia-classifier-example.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/classification/wikipedia-classifier-example.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/classification/wikipedia-classifier-example.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,11 +264,22 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="wikipedia-xml-parser-and-naive-bayes-classifier-example">Wikipedia 
XML parser and Naive Bayes Classifier Example</h1>
-<h2 id="introduction">Introduction</h2>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="wikipedia-xml-parser-and-naive-bayes-classifier-example">Wikipedia XML 
parser and Naive Bayes Classifier Example<a class="headerlink" 
href="#wikipedia-xml-parser-and-naive-bayes-classifier-example" 
title="Permanent link">&para;</a></h1>
+<h2 id="introduction">Introduction<a class="headerlink" href="#introduction" 
title="Permanent link">&para;</a></h2>
 <p>Mahout has an <a 
href="https://github.com/apache/mahout/blob/master/examples/bin/classify-wikipedia.sh";>example
 script</a> [1] which will download a recent XML dump of the (entire if 
desired) <a href="http://dumps.wikimedia.org/enwiki/latest/";>English Wikipedia 
database</a>. After running the classification script, you can use the <a 
href="https://github.com/apache/mahout/blob/master/examples/bin/spark-document-classifier.mscala";>document
 classification script</a> from the Mahout <a 
href="http://mahout.apache.org/users/sparkbindings/play-with-shell.html";>spark-shell</a>
 to vectorize and classify text from outside of the training and testing corpus 
using a modle built on the Wikipedia dataset.  </p>
 <p>You can run this script to build and test a Naive Bayes classifier for 
option (1) 10 arbitrary countries or option (2) 2 countries (United States and 
United Kingdom).</p>
-<h2 id="oververview">Oververview</h2>
+<h2 id="oververview">Oververview<a class="headerlink" href="#oververview" 
title="Permanent link">&para;</a></h2>
 <p>Tou run the example simply execute the 
<code>$MAHOUT_HOME/examples/bin/classify-wikipedia.sh</code> script.</p>
 <p>By defult the script is set to run on a medium sized Wikipedia XML dump.  
To run on the full set (the entire english Wikipedia) you can change the 
download by commenting out line 78, and uncommenting line 80  of <a 
href="https://github.com/apache/mahout/blob/master/examples/bin/classify-wikipedia.sh";>classify-wikipedia.sh</a>
 [1]. However this is not recommended unless you have the resources to do so. 
<em>Be sure to clean your work directory when changing datasets- option 
(3).</em></p>
 <p>The step by step process for Creating a Naive Bayes Classifier for the 
Wikipedia XML dump is very similar to that for <a 
href="http://mahout.apache.org/users/classification/twenty-newsgroups.html";>creating
 a 20 Newsgroups Classifier</a> [4].  The only difference being that instead of 
running <code>$mahout seqdirectory</code> on the unzipped 20 Newsgroups file, 
you'll run <code>$mahout seqwiki</code> on the unzipped Wikipedia xml dump.</p>
@@ -290,7 +302,7 @@ directory:  country.txt, country10.txt a
 
 
 <p>After <code>seqwiki</code>, the script runs <code>seq2sparse</code>, 
<code>split</code>, <code>trainnb</code> and <code>testnb</code> as in the <a 
href="http://mahout.apache.org/users/classification/twenty-newsgroups.html";>step
 by step 20newsgroups example</a>.  When all of the jobs have finished, a 
confusion matrix will be displayed.</p>
-<h1 id="resourcese">Resourcese</h1>
+<h1 id="resourcese">Resourcese<a class="headerlink" href="#resourcese" 
title="Permanent link">&para;</a></h1>
 <p>[1] <a 
href="https://github.com/apache/mahout/blob/master/examples/bin/classify-wikipedia.sh";>classify-wikipedia.sh</a></p>
 <p>[2] <a 
href="https://github.com/apache/mahout/blob/master/examples/bin/spark-document-classifier.mscala";>Document
 classification script for the Mahout Spark Shell</a></p>
 <p>[3] <a 
href="https://github.com/apache/mahout/blob/master/examples/src/test/resources/country10.txt";>Example
 category file</a></p>

Modified: 
websites/staging/mahout/trunk/content/users/clustering/20newsgroups.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/clustering/20newsgroups.html 
(original)
+++ websites/staging/mahout/trunk/content/users/clustering/20newsgroups.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="20Newsgroups-NaiveBayesusing20NewsgroupsData"></a></p>
-<h1 id="naive-bayes-using-20-newsgroups-data">Naive Bayes using 20 Newsgroups 
Data</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="20Newsgroups-NaiveBayesusing20NewsgroupsData"></a></p>
+<h1 id="naive-bayes-using-20-newsgroups-data">Naive Bayes using 20 Newsgroups 
Data<a class="headerlink" href="#naive-bayes-using-20-newsgroups-data" 
title="Permanent link">&para;</a></h1>
 <p>See <a 
href="https://issues.apache.org/jira/browse/MAHOUT-9";>https://issues.apache.org/jira/browse/MAHOUT-9</a></p>
    </div>
   </div>     

Modified: 
websites/staging/mahout/trunk/content/users/clustering/canopy-clustering.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/canopy-clustering.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/canopy-clustering.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="CanopyClustering-CanopyClustering"></a></p>
-<h1 id="canopy-clustering">Canopy Clustering</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="CanopyClustering-CanopyClustering"></a></p>
+<h1 id="canopy-clustering">Canopy Clustering<a class="headerlink" 
href="#canopy-clustering" title="Permanent link">&para;</a></h1>
 <p><a href="http://www.kamalnigam.com/papers/canopy-kdd00.pdf";>Canopy 
Clustering</a>
  is a very simple, fast and surprisingly accurate method for grouping
 objects into clusters. All objects are represented as a point in a
@@ -285,7 +297,7 @@ distance measurements can be significant
 outside of the initial canopies.</p>
 <p><strong>WARNING</strong>: Canopy is deprecated in the latest release and 
will be removed once streaming k-means becomes stable enough.</p>
 <p><a name="CanopyClustering-Strategyforparallelization"></a></p>
-<h2 id="strategy-for-parallelization">Strategy for parallelization</h2>
+<h2 id="strategy-for-parallelization">Strategy for parallelization<a 
class="headerlink" href="#strategy-for-parallelization" title="Permanent 
link">&para;</a></h2>
 <p>Looking at the sample Hadoop implementation in <a 
href="http://code.google.com/p/canopy-clustering/";>http://code.google.com/p/canopy-clustering/</a>
  the processing is done in 3 M/R steps:
 1. The data is massaged into suitable input format
@@ -299,13 +311,13 @@ centers
 . Finally here is the <a 
href="http://en.wikipedia.org/wiki/Canopy_clustering_algorithm";>Wikipedia 
page</a>
 .</p>
 <p><a name="CanopyClustering-Designofimplementation"></a></p>
-<h2 id="design-of-implementation">Design of implementation</h2>
+<h2 id="design-of-implementation">Design of implementation<a 
class="headerlink" href="#design-of-implementation" title="Permanent 
link">&para;</a></h2>
 <p>The implementation accepts as input Hadoop SequenceFiles containing
 multidimensional points (VectorWritable). Points may be expressed either as
 dense or sparse Vectors and processing is done in two phases: Canopy
 generation and, optionally, Clustering.</p>
 <p><a name="CanopyClustering-Canopygenerationphase"></a></p>
-<h3 id="canopy-generation-phase">Canopy generation phase</h3>
+<h3 id="canopy-generation-phase">Canopy generation phase<a class="headerlink" 
href="#canopy-generation-phase" title="Permanent link">&para;</a></h3>
 <p>During the map step, each mapper processes a subset of the total points and
 applies the chosen distance measure and thresholds to generate canopies. In
 the mapper, each point which is found to be within an existing canopy will
@@ -318,7 +330,7 @@ final set of canopy centroids which is o
 centroids). The reducer output format is: SequenceFile(Text, Canopy) with
 the <em>key</em> encoding the canopy identifier. </p>
 <p><a name="CanopyClustering-Clusteringphase"></a></p>
-<h3 id="clustering-phase">Clustering phase</h3>
+<h3 id="clustering-phase">Clustering phase<a class="headerlink" 
href="#clustering-phase" title="Permanent link">&para;</a></h3>
 <p>During the clustering phase, each mapper reads the Canopies produced by the
 first phase. Since all mappers have the same canopy definitions, their
 outputs will be combined during the shuffle so that each reducer (many are
@@ -329,7 +341,7 @@ WeightedVectorWritable has two fields: a
 vector. Together they encode the probability that each vector is a member
 of the given canopy.</p>
 <p><a name="CanopyClustering-RunningCanopyClustering"></a></p>
-<h2 id="running-canopy-clustering">Running Canopy Clustering</h2>
+<h2 id="running-canopy-clustering">Running Canopy Clustering<a 
class="headerlink" href="#running-canopy-clustering" title="Permanent 
link">&para;</a></h2>
 <p>The canopy clustering algorithm may be run using a command-line invocation
 on CanopyDriver.main or by making a Java call to CanopyDriver.run(...).
 Both require several arguments:</p>
@@ -390,7 +402,7 @@ clustering, the weights are computed as
 is between the cluster center and the vector using the chosen
 DistanceMeasure.</p>
 <p><a name="CanopyClustering-Examples"></a></p>
-<h1 id="examples">Examples</h1>
+<h1 id="examples">Examples<a class="headerlink" href="#examples" 
title="Permanent link">&para;</a></h1>
 <p>The following images illustrate Canopy clustering applied to a set of
 randomly-generated 2-d data points. The points are generated using a normal
 distribution centered at a mean location and with a constant standard

Modified: 
websites/staging/mahout/trunk/content/users/clustering/canopy-commandline.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/canopy-commandline.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/canopy-commandline.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a 
name="canopy-commandline-RunningCanopyClusteringfromtheCommandLine"></a></p>
-<h1 id="running-canopy-clustering-from-the-command-line">Running Canopy 
Clustering from the Command Line</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a 
name="canopy-commandline-RunningCanopyClusteringfromtheCommandLine"></a></p>
+<h1 id="running-canopy-clustering-from-the-command-line">Running Canopy 
Clustering from the Command Line<a class="headerlink" 
href="#running-canopy-clustering-from-the-command-line" title="Permanent 
link">&para;</a></h1>
 <p>Mahout's Canopy clustering can be launched from the same command line
 invocation whether you are running on a single machine in stand-alone mode
 or on a larger Hadoop cluster. The difference is determined by the
@@ -283,7 +295,7 @@ the Mahout version number. For example,
 job will be mahout-core-0.3.job</li>
 </ul>
 <p><a name="canopy-commandline-Testingitononesinglemachinew/ocluster"></a></p>
-<h2 id="testing-it-on-one-single-machine-wo-cluster">Testing it on one single 
machine w/o cluster</h2>
+<h2 id="testing-it-on-one-single-machine-wo-cluster">Testing it on one single 
machine w/o cluster<a class="headerlink" 
href="#testing-it-on-one-single-machine-wo-cluster" title="Permanent 
link">&para;</a></h2>
 <ul>
 <li>Put the data: cp <PATH TO DATA> testdata</li>
 <li>
@@ -293,7 +305,7 @@ org.apache.mahout.common.distance.Cosine
 </li>
 </ul>
 <p><a name="canopy-commandline-Runningitonthecluster"></a></p>
-<h2 id="running-it-on-the-cluster">Running it on the cluster</h2>
+<h2 id="running-it-on-the-cluster">Running it on the cluster<a 
class="headerlink" href="#running-it-on-the-cluster" title="Permanent 
link">&para;</a></h2>
 <ul>
 <li>(As needed) Start up Hadoop: $HADOOP_HOME/bin/start-all.sh</li>
 <li>Put the data: $HADOOP_HOME/bin/hadoop fs -put <PATH TO DATA> testdata</li>
@@ -310,7 +322,7 @@ to view all outputs.</p>
 </li>
 </ul>
 <p><a name="canopy-commandline-Commandlineoptions"></a></p>
-<h1 id="command-line-options">Command line options</h1>
+<h1 id="command-line-options">Command line options<a class="headerlink" 
href="#command-line-options" title="Permanent link">&para;</a></h1>
 <div class="codehilite"><pre>  <span class="o">--</span><span 
class="n">input</span> <span class="p">(</span><span class="o">-</span><span 
class="nb">i</span><span class="p">)</span> <span class="n">input</span>        
         <span class="n">Path</span> <span class="n">to</span> <span 
class="n">job</span> <span class="n">input</span> <span 
class="n">directory</span><span class="p">.</span><span class="n">Must</span>  
                          <span class="n">be</span> <span class="n">a</span> 
<span class="n">SequenceFile</span> <span class="n">of</span>       
                          <span class="n">VectorWritable</span>         

Modified: 
websites/staging/mahout/trunk/content/users/clustering/cluster-dumper.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/clustering/cluster-dumper.html 
(original)
+++ websites/staging/mahout/trunk/content/users/clustering/cluster-dumper.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,14 +264,25 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="ClusterDumper-Introduction"></a></p>
-<h2 id="cluster-dumper-introduction">Cluster Dumper - Introduction</h2>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="ClusterDumper-Introduction"></a></p>
+<h2 id="cluster-dumper-introduction">Cluster Dumper - Introduction<a 
class="headerlink" href="#cluster-dumper-introduction" title="Permanent 
link">&para;</a></h2>
 <p>Clustering tasks in Mahout will output data in the format of a SequenceFile
 (Text, Cluster) and the Text is a cluster identifier string. To analyze
 this output we need to convert the sequence files to a human readable
 format and this is achieved using the clusterdump utility.</p>
 <p><a 
name="ClusterDumper-Stepsforanalyzingclusteroutputusingclusterdumputility"></a></p>
-<h2 id="steps-for-analyzing-cluster-output-using-clusterdump-utility">Steps 
for analyzing cluster output using clusterdump utility</h2>
+<h2 id="steps-for-analyzing-cluster-output-using-clusterdump-utility">Steps 
for analyzing cluster output using clusterdump utility<a class="headerlink" 
href="#steps-for-analyzing-cluster-output-using-clusterdump-utility" 
title="Permanent link">&para;</a></h2>
 <p>After you've executed a clustering tasks (either examples or real-world),
 you can run clusterdumper in 2 modes:</p>
 <ol>
@@ -278,7 +290,7 @@ you can run clusterdumper in 2 modes:</p
 <li>Standalone Java Program </li>
 </ol>
 <p><a name="ClusterDumper-HadoopEnvironment{anchor:HadoopEnvironment}"></a></p>
-<h3 id="hadoop-environment">Hadoop Environment</h3>
+<h3 id="hadoop-environment">Hadoop Environment<a class="headerlink" 
href="#hadoop-environment" title="Permanent link">&para;</a></h3>
 <p>If you have setup your HADOOP_HOME environment variable, you can use the
 command line utility <code>mahout</code> to execute the ClusterDumper on 
Hadoop. In
 this case we wont need to get the output clusters to our local machines.
@@ -286,7 +298,7 @@ The utility will read the output cluster
 human-readable cluster values into our local file system. Say you've just
 executed the <a href="clustering-of-synthetic-control-data.html">synthetic 
control example </a>
  and want to analyze the output, you can execute the <code>mahout 
clusterdumper</code> utility from the command line.</p>
-<h4 id="cli-options">CLI options:</h4>
+<h4 id="cli-options">CLI options:<a class="headerlink" href="#cli-options" 
title="Permanent link">&para;</a></h4>
 <div class="codehilite"><pre><span class="o">--</span><span 
class="n">help</span>                               <span 
class="n">Print</span> <span class="n">out</span> <span class="n">help</span> 
 <span class="o">--</span><span class="n">input</span> <span 
class="p">(</span><span class="o">-</span><span class="nb">i</span><span 
class="p">)</span> <span class="n">input</span>                   <span 
class="n">The</span> <span class="n">directory</span> <span 
class="n">containing</span> <span class="n">Sequence</span>
                                        <span class="n">Files</span> <span 
class="k">for</span> <span class="n">the</span> <span class="n">Clusters</span> 
      
@@ -316,7 +328,7 @@ executed the <a href="clustering-of-synt
 </pre></div>
 
 
-<h3 id="standalone-java-program">Standalone Java Program</h3>
+<h3 id="standalone-java-program">Standalone Java Program<a class="headerlink" 
href="#standalone-java-program" title="Permanent link">&para;</a></h3>
 <p>Run the clusterdump utility as follows as a standalone Java Program through 
Eclipse. <!-- - if you are using eclipse, setup mahout-utils as a project as 
specified in <a href="../../developers/buildingmahout.html">Working with Maven 
in Eclipse</a>. -->
     To execute ClusterDumper.java,</p>
 <ul>

Modified: 
websites/staging/mahout/trunk/content/users/clustering/clustering-of-synthetic-control-data.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/clustering-of-synthetic-control-data.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/clustering-of-synthetic-control-data.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,12 +264,23 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="clustering-synthetic-control-data">Clustering synthetic control 
data</h1>
-<h2 id="introduction">Introduction</h2>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="clustering-synthetic-control-data">Clustering synthetic control data<a 
class="headerlink" href="#clustering-synthetic-control-data" title="Permanent 
link">&para;</a></h1>
+<h2 id="introduction">Introduction<a class="headerlink" href="#introduction" 
title="Permanent link">&para;</a></h2>
 <p>This example will demonstrate clustering of time series data, specifically 
control charts. <a href="http://en.wikipedia.org/wiki/Control_chart";>Control 
charts</a> are tools used to determine whether a manufacturing or business 
process is in a state of statistical control. Such control charts are generated 
/ simulated repeatedly at equal time intervals. A <a 
href="http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data.html";>simulated
 dataset</a> is available for use in UCI machine learning repository.</p>
 <p>A time series of control charts needs to be clustered into their close knit 
groups. The data set we use is synthetic and is meant to resemble real world 
information in an anonymized format. It contains six different classes: Normal, 
Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift. In 
this example we will use Mahout to cluster the data into corresponding class 
buckets. </p>
 <p><em>For the sake of simplicity, we won't use a cluster in this example, but 
instead show you the commands to run the clustering examples locally with 
Hadoop</em>.</p>
-<h2 id="setup">Setup</h2>
+<h2 id="setup">Setup<a class="headerlink" href="#setup" title="Permanent 
link">&para;</a></h2>
 <p>We need to do some initial setup before we are able to run the example. </p>
 <ol>
 <li>
@@ -287,7 +299,7 @@
 <p>Create a folder called <em>testdata</em> in the current directory and copy 
the dataset into this folder.</p>
 </li>
 </ol>
-<h2 id="clustering-examples">Clustering Examples</h2>
+<h2 id="clustering-examples">Clustering Examples<a class="headerlink" 
href="#clustering-examples" title="Permanent link">&para;</a></h2>
 <p>Depending on the clustering algorithm you want to run, the following 
commands can be used:</p>
 <ul>
 <li>

Modified: 
websites/staging/mahout/trunk/content/users/clustering/clustering-seinfeld-episodes.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/clustering-seinfeld-episodes.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/clustering-seinfeld-episodes.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p>Below is short tutorial on how to cluster Seinfeld episode transcripts 
with
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p>Below is short tutorial on how to cluster Seinfeld episode transcripts with
 Mahout.</p>
 
<p>http://blog.jteam.nl/2011/04/04/how-to-cluster-seinfeld-episodes-with-mahout/</p>
    </div>

Modified: 
websites/staging/mahout/trunk/content/users/clustering/clusteringyourdata.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/clusteringyourdata.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/clusteringyourdata.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,16 +264,27 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="clustering-your-data">Clustering your data</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="clustering-your-data">Clustering your data<a class="headerlink" 
href="#clustering-your-data" title="Permanent link">&para;</a></h1>
 <p>After you've done the <a href="quickstart.html">Quickstart</a> and are 
familiar with the basics of Mahout, it is time to cluster your own
 data. See also <a href="en.wikipedia.org/wiki/Cluster_analysis">Wikipedia on 
cluster analysis</a> for more background.</p>
 <p>The following pieces <em>may</em> be useful for in getting started:</p>
 <p><a name="ClusteringYourData-Input"></a></p>
-<h1 id="input">Input</h1>
+<h1 id="input">Input<a class="headerlink" href="#input" title="Permanent 
link">&para;</a></h1>
 <p>For starters, you will need your data in an appropriate Vector format, see 
<a href="../basics/creating-vectors.html">Creating Vectors</a>.
 In particular for text preparation check out <a 
href="../basics/creating-vectors-from-text.html">Creating Vectors from 
Text</a>.</p>
 <p><a name="ClusteringYourData-RunningtheProcess"></a></p>
-<h1 id="running-the-process">Running the Process</h1>
+<h1 id="running-the-process">Running the Process<a class="headerlink" 
href="#running-the-process" title="Permanent link">&para;</a></h1>
 <ul>
 <li>
 <p><a href="canopy-clustering.html">Canopy background</a> and <a 
href="canopy-commandline.html">canopy-commandline</a>.</p>
@@ -295,14 +307,14 @@ In particular for text preparation check
 </li>
 </ul>
 <p><a name="ClusteringYourData-RetrievingtheOutput"></a></p>
-<h1 id="retrieving-the-output">Retrieving the Output</h1>
+<h1 id="retrieving-the-output">Retrieving the Output<a class="headerlink" 
href="#retrieving-the-output" title="Permanent link">&para;</a></h1>
 <p>Mahout has a cluster dumper utility that can be used to retrieve and 
evaluate your clustering data.</p>
 <div class="codehilite"><pre><span class="o">./</span><span 
class="n">bin</span><span class="o">/</span><span class="n">mahout</span> <span 
class="n">clusterdump</span> <span class="o">&lt;</span><span 
class="n">OPTIONS</span><span class="o">&gt;</span>
 </pre></div>
 
 
 <p><a name="ClusteringYourData-Theclusterdumperoptionsare:"></a></p>
-<h2 id="the-cluster-dumper-options-are">The cluster dumper options are:</h2>
+<h2 id="the-cluster-dumper-options-are">The cluster dumper options are:<a 
class="headerlink" href="#the-cluster-dumper-options-are" title="Permanent 
link">&para;</a></h2>
 <div class="codehilite"><pre>  <span class="o">--</span><span 
class="n">help</span> <span class="p">(</span><span class="o">-</span><span 
class="n">h</span><span class="p">)</span>                  <span 
class="n">Print</span> <span class="n">out</span> <span class="n">help</span>
 
   <span class="o">--</span><span class="n">input</span> <span 
class="p">(</span><span class="o">-</span><span class="nb">i</span><span 
class="p">)</span> <span class="n">input</span>               <span 
class="n">The</span> <span class="n">directory</span> <span 
class="n">containing</span> <span class="n">Sequence</span>    
@@ -346,7 +358,7 @@ In particular for text preparation check
 
 <p>More information on using clusterdump utility can be found <a 
href="cluster-dumper.html">here</a></p>
 <p><a name="ClusteringYourData-ValidatingtheOutput"></a></p>
-<h1 id="validating-the-output">Validating the Output</h1>
+<h1 id="validating-the-output">Validating the Output<a class="headerlink" 
href="#validating-the-output" title="Permanent link">&para;</a></h1>
 <p>{quote}
 Ted Dunning: A principled approach to cluster evaluation is to measure how 
well the
 cluster membership captures the structure of unseen data.  A natural
@@ -369,12 +381,11 @@ data.</p>
 <p>For text, you can actually compute perplexity which measures how well
 cluster membership predicts what words are used.  This is nice because you
 don't have to worry about the entropy of real valued numbers.</p>
-<p>Manual inspection and the so-called laugh test is also important.  The idea
+<p quote="quote">Manual inspection and the so-called laugh test is also 
important.  The idea
 is that the results should not be so ludicrous as to make you laugh.
 Unfortunately, it is pretty easy to kid yourself into thinking your system
 is working using this kind of inspection.  The problem is that we are too
-good at seeing (making up) patterns.
-{quote}</p>
+good at seeing (making up) patterns.</p>
    </div>
   </div>     
 </div> 

Modified: 
websites/staging/mahout/trunk/content/users/clustering/expectation-maximization.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/expectation-maximization.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/expectation-maximization.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="ExpectationMaximization-ExpectationMaximization"></a></p>
-<h1 id="expectation-maximization">Expectation Maximization</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="ExpectationMaximization-ExpectationMaximization"></a></p>
+<h1 id="expectation-maximization">Expectation Maximization<a 
class="headerlink" href="#expectation-maximization" title="Permanent 
link">&para;</a></h1>
 <p>The principle of EM can be applied to several learning settings, but is
 most commonly associated with clustering. The main principle of the
 algorithm is comparable to k-Means. Yet in contrast to hard cluster
@@ -272,7 +284,7 @@ assignments, each object is given some p
 Accordingly cluster centers are recomputed based on the average of all
 objects weighted by their probability of belonging to the cluster at hand.</p>
 <p><a name="ExpectationMaximization-Canopy-modifiedEM"></a></p>
-<h2 id="canopy-modified-em">Canopy-modified EM</h2>
+<h2 id="canopy-modified-em">Canopy-modified EM<a class="headerlink" 
href="#canopy-modified-em" title="Permanent link">&para;</a></h2>
 <p>One can also use the canopies idea to speed up prototypebased clustering
 methods like K-means and Expectation-Maximization (EM). In general, neither
 K-means nor EMspecify how many clusters to use. The canopies technique does
@@ -306,9 +318,9 @@ iterative step (apart from the enormous
 fewer terms) will be negligible since points outside the canopy will have
 exponentially small influence.</p>
 <p><a name="ExpectationMaximization-StrategyforParallelization"></a></p>
-<h2 id="strategy-for-parallelization">Strategy for Parallelization</h2>
+<h2 id="strategy-for-parallelization">Strategy for Parallelization<a 
class="headerlink" href="#strategy-for-parallelization" title="Permanent 
link">&para;</a></h2>
 <p><a name="ExpectationMaximization-Map/ReduceImplementation"></a></p>
-<h2 id="mapreduce-implementation">Map/Reduce Implementation</h2>
+<h2 id="mapreduce-implementation">Map/Reduce Implementation<a 
class="headerlink" href="#mapreduce-implementation" title="Permanent 
link">&para;</a></h2>
    </div>
   </div>     
 </div> 

Modified: 
websites/staging/mahout/trunk/content/users/clustering/fuzzy-k-means-commandline.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/fuzzy-k-means-commandline.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/fuzzy-k-means-commandline.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a 
name="fuzzy-k-means-commandline-RunningFuzzyk-MeansClusteringfromtheCommandLine"></a></p>
-<h1 id="running-fuzzy-k-means-clustering-from-the-command-line">Running Fuzzy 
k-Means Clustering from the Command Line</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a 
name="fuzzy-k-means-commandline-RunningFuzzyk-MeansClusteringfromtheCommandLine"></a></p>
+<h1 id="running-fuzzy-k-means-clustering-from-the-command-line">Running Fuzzy 
k-Means Clustering from the Command Line<a class="headerlink" 
href="#running-fuzzy-k-means-clustering-from-the-command-line" title="Permanent 
link">&para;</a></h1>
 <p>Mahout's Fuzzy k-Means clustering can be launched from the same command
 line invocation whether you are running on a single machine in stand-alone
 mode or on a larger Hadoop cluster. The difference is determined by the
@@ -283,7 +295,7 @@ the Mahout version number. For example,
 job will be mahout-core-0.3.job</li>
 </ul>
 <p><a 
name="fuzzy-k-means-commandline-Testingitononesinglemachinew/ocluster"></a></p>
-<h2 id="testing-it-on-one-single-machine-wo-cluster">Testing it on one single 
machine w/o cluster</h2>
+<h2 id="testing-it-on-one-single-machine-wo-cluster">Testing it on one single 
machine w/o cluster<a class="headerlink" 
href="#testing-it-on-one-single-machine-wo-cluster" title="Permanent 
link">&para;</a></h2>
 <ul>
 <li>Put the data: cp <PATH TO DATA> testdata</li>
 <li>
@@ -292,7 +304,7 @@ job will be mahout-core-0.3.job</li>
 </li>
 </ul>
 <p><a name="fuzzy-k-means-commandline-Runningitonthecluster"></a></p>
-<h2 id="running-it-on-the-cluster">Running it on the cluster</h2>
+<h2 id="running-it-on-the-cluster">Running it on the cluster<a 
class="headerlink" href="#running-it-on-the-cluster" title="Permanent 
link">&para;</a></h2>
 <ul>
 <li>(As needed) Start up Hadoop: $HADOOP_HOME/bin/start-all.sh</li>
 <li>Put the data: $HADOOP_HOME/bin/hadoop fs -put <PATH TO DATA> testdata</li>
@@ -308,7 +320,7 @@ to view all outputs.</p>
 </li>
 </ul>
 <p><a name="fuzzy-k-means-commandline-Commandlineoptions"></a></p>
-<h1 id="command-line-options">Command line options</h1>
+<h1 id="command-line-options">Command line options<a class="headerlink" 
href="#command-line-options" title="Permanent link">&para;</a></h1>
 <div class="codehilite"><pre>  <span class="o">--</span><span 
class="n">input</span> <span class="p">(</span><span class="o">-</span><span 
class="nb">i</span><span class="p">)</span> <span class="n">input</span>        
           <span class="n">Path</span> <span class="n">to</span> <span 
class="n">job</span> <span class="n">input</span> <span 
class="n">directory</span><span class="p">.</span> 
                            <span class="n">Must</span> <span 
class="n">be</span> <span class="n">a</span> <span 
class="n">SequenceFile</span> <span class="n">of</span>    
                            <span class="n">VectorWritable</span>           

Modified: 
websites/staging/mahout/trunk/content/users/clustering/fuzzy-k-means.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/clustering/fuzzy-k-means.html 
(original)
+++ websites/staging/mahout/trunk/content/users/clustering/fuzzy-k-means.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="fuzzy-k-means">Fuzzy K-Means</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="fuzzy-k-means">Fuzzy K-Means<a class="headerlink" 
href="#fuzzy-k-means" title="Permanent link">&para;</a></h1>
 <p>Fuzzy K-Means (also called Fuzzy C-Means) is an extension of <a 
href="http://mahout.apache.org/users/clustering/k-means-clustering.html";>K-Means</a>
 , the popular simple clustering technique. While K-Means discovers hard
 clusters (a point belong to only one cluster), Fuzzy K-Means is a more
@@ -271,7 +283,7 @@ statistically formalized method and disc
 particular point can belong to more than one cluster with certain
 probability.</p>
 <p><a name="FuzzyK-Means-Algorithm"></a></p>
-<h4 id="algorithm">Algorithm</h4>
+<h4 id="algorithm">Algorithm<a class="headerlink" href="#algorithm" 
title="Permanent link">&para;</a></h4>
 <p>Like K-Means, Fuzzy K-Means works on those objects which can be represented
 in n-dimensional vector space and a distance measure is defined.
 The algorithm is similar to k-means.</p>
@@ -284,7 +296,7 @@ The algorithm is similar to k-means.</p>
 </li>
 </ul>
 <p><a name="FuzzyK-Means-DesignImplementation"></a></p>
-<h4 id="design-implementation">Design Implementation</h4>
+<h4 id="design-implementation">Design Implementation<a class="headerlink" 
href="#design-implementation" title="Permanent link">&para;</a></h4>
 <p>The design is similar to K-Means present in Mahout. It accepts an input
 file containing vector points. User can either provide the cluster centers
 as input or can allow canopy algorithm to run and create initial clusters.</p>
@@ -320,7 +332,7 @@ identifier (e.g. "C14". Output value is:
 "C14"). The reducer encodes unconverged clusters with a 'Cn' cluster Id and
 converged clusters with 'Vn' clusterId.</p>
 <p><a name="FuzzyK-Means-RunningFuzzyk-MeansClustering"></a></p>
-<h2 id="running-fuzzy-k-means-clustering">Running Fuzzy k-Means Clustering</h2>
+<h2 id="running-fuzzy-k-means-clustering">Running Fuzzy k-Means Clustering<a 
class="headerlink" href="#running-fuzzy-k-means-clustering" title="Permanent 
link">&para;</a></h2>
 <p>The Fuzzy k-Means clustering algorithm may be run using a command-line
 invocation on FuzzyKMeansDriver.main or by making a Java call to
 FuzzyKMeansDriver.run(). </p>
@@ -389,7 +401,7 @@ double <em>weight</em> and a VectorWrita
 computed as 1/(1+distance) where the distance is between the cluster center
 and the vector using the chosen DistanceMeasure. </p>
 <p><a name="FuzzyK-Means-Examples"></a></p>
-<h1 id="examples">Examples</h1>
+<h1 id="examples">Examples<a class="headerlink" href="#examples" 
title="Permanent link">&para;</a></h1>
 <p>The following images illustrate Fuzzy k-Means clustering applied to a set
 of randomly-generated 2-d data points. The points are generated using a
 normal distribution centered at a mean location and with a constant
@@ -416,7 +428,7 @@ data set which is generated using asymme
 Fuzzy k-Means does a fair job handling this data set as well.</p>
 <p><img alt="fuzzy" src="../../images/2dFuzzyKMeans.png" /></p>
 <p><a name="FuzzyK-Means-References&nbsp;"></a></p>
-<h4 id="referenceswzxhzdk15">References&nbsp;</h4>
+<h4 id="references">References&nbsp;<a class="headerlink" href="#references" 
title="Permanent link">&para;</a></h4>
 <ul>
 <li><a 
href="http://en.wikipedia.org/wiki/Fuzzy_clustering";>http://en.wikipedia.org/wiki/Fuzzy_clustering</a></li>
 </ul>

Modified: 
websites/staging/mahout/trunk/content/users/clustering/hierarchical-clustering.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/hierarchical-clustering.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/hierarchical-clustering.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p>Hierarchical clustering is the process or finding bigger clusters, and 
also
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p>Hierarchical clustering is the process or finding bigger clusters, and also
 the smaller clusters inside the bigger clusters.</p>
 <p>In Apache Mahout, separate algorithms can be used for finding clusters at
 different levels. </p>

Modified: 
websites/staging/mahout/trunk/content/users/clustering/k-means-clustering.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/k-means-clustering.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/k-means-clustering.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="k-means-clustering-basics">k-Means clustering - basics</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="k-means-clustering-basics">k-Means clustering - basics<a 
class="headerlink" href="#k-means-clustering-basics" title="Permanent 
link">&para;</a></h1>
 <p><a href="http://en.wikipedia.org/wiki/Kmeans";>k-Means</a> is a simple but 
well-known algorithm for grouping objects, clustering. All objects need to be 
represented
 as a set of numerical features. In addition, the user has to specify the
 number of groups (referred to as <em>k</em>) she wishes to identify.</p>
@@ -284,7 +296,7 @@ computation of new average centers have
 estimation of the number of clusters <em>k</em>. Yet the main principle always
 remains the same.</p>
 <p><a name="K-MeansClustering-Quickstart"></a></p>
-<h2 id="quickstart">Quickstart</h2>
+<h2 id="quickstart">Quickstart<a class="headerlink" href="#quickstart" 
title="Permanent link">&para;</a></h2>
 <p><a 
href="https://github.com/apache/mahout/blob/master/examples/bin/cluster-reuters.sh";>Here</a>
  is a short shell script outline that will get you started quickly with
 k-means. This does the following:</p>
@@ -301,7 +313,7 @@ reuters-out from reuters-sgm (the downlo
 <p>After following through the output that scrolls past, reading the code will
 offer you a better understanding.</p>
 <p><a name="K-MeansClustering-Designofimplementation"></a></p>
-<h2 id="implementation">Implementation</h2>
+<h2 id="implementation">Implementation<a class="headerlink" 
href="#implementation" title="Permanent link">&para;</a></h2>
 <p>The implementation accepts two input directories: one for the data points
 and one for the initial clusters. The data directory contains multiple
 input files of SequenceFile(Key, VectorWritable), while the clusters
@@ -330,7 +342,7 @@ iteration and 'clusteredPoints' will con
 implementation provided by Mahout:
 <img src="../../images/Example implementation of k-Means provided with 
Mahout.png"></p>
 <p><a name="K-MeansClustering-Runningk-MeansClustering"></a></p>
-<h2 id="running-k-means-clustering">Running k-Means Clustering</h2>
+<h2 id="running-k-means-clustering">Running k-Means Clustering<a 
class="headerlink" href="#running-k-means-clustering" title="Permanent 
link">&para;</a></h2>
 <p>The k-Means clustering algorithm may be run using a command-line invocation
 on KMeansDriver.main or by making a Java call to KMeansDriver.runJob().</p>
 <p>Invocation using the command line takes the form:</p>
@@ -386,7 +398,7 @@ clustering, the weights are computed as
 is between the cluster center and the vector using the chosen
 DistanceMeasure.</p>
 <p><a name="K-MeansClustering-Examples"></a></p>
-<h1 id="examples">Examples</h1>
+<h1 id="examples">Examples<a class="headerlink" href="#examples" 
title="Permanent link">&para;</a></h1>
 <p>The following images illustrate k-Means clustering applied to a set of
 randomly-generated 2-d data points. The points are generated using a normal
 distribution centered at a mean location and with a constant standard

Modified: 
websites/staging/mahout/trunk/content/users/clustering/k-means-commandline.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/k-means-commandline.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/k-means-commandline.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,12 +264,23 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="k-means-commandline-Introduction"></a></p>
-<h1 id="kmeans-commandline-introduction">kMeans commandline introduction</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="k-means-commandline-Introduction"></a></p>
+<h1 id="kmeans-commandline-introduction">kMeans commandline introduction<a 
class="headerlink" href="#kmeans-commandline-introduction" title="Permanent 
link">&para;</a></h1>
 <p>This quick start page describes how to run the kMeans clustering algorithm
 on a Hadoop cluster. </p>
 <p><a name="k-means-commandline-Steps"></a></p>
-<h1 id="steps">Steps</h1>
+<h1 id="steps">Steps<a class="headerlink" href="#steps" title="Permanent 
link">&para;</a></h1>
 <p>Mahout's k-Means clustering can be launched from the same command line
 invocation whether you are running on a single machine in stand-alone mode
 or on a larger Hadoop cluster. The difference is determined by the
@@ -285,7 +297,7 @@ will be generated in $MAHOUT_HOME/core/t
 the Mahout version number. For example, when using Mahout 0.3 release, the
 job will be mahout-core-0.3.job</p>
 <p><a name="k-means-commandline-Testingitononesinglemachinew/ocluster"></a></p>
-<h2 id="testing-it-on-one-single-machine-wo-cluster">Testing it on one single 
machine w/o cluster</h2>
+<h2 id="testing-it-on-one-single-machine-wo-cluster">Testing it on one single 
machine w/o cluster<a class="headerlink" 
href="#testing-it-on-one-single-machine-wo-cluster" title="Permanent 
link">&para;</a></h2>
 <ul>
 <li>Put the data: cp <PATH TO DATA> testdata</li>
 <li>
@@ -296,7 +308,7 @@ org.apache.mahout.common.distance.Cosine
 </li>
 </ul>
 <p><a name="k-means-commandline-Runningitonthecluster"></a></p>
-<h2 id="running-it-on-the-cluster">Running it on the cluster</h2>
+<h2 id="running-it-on-the-cluster">Running it on the cluster<a 
class="headerlink" href="#running-it-on-the-cluster" title="Permanent 
link">&para;</a></h2>
 <ul>
 <li>(As needed) Start up Hadoop: $HADOOP_HOME/bin/start-all.sh</li>
 <li>Put the data: $HADOOP_HOME/bin/hadoop fs -put <PATH TO DATA> testdata</li>
@@ -312,7 +324,7 @@ to view all outputs.</p>
 </li>
 </ul>
 <p><a name="k-means-commandline-Commandlineoptions"></a></p>
-<h1 id="command-line-options">Command line options</h1>
+<h1 id="command-line-options">Command line options<a class="headerlink" 
href="#command-line-options" title="Permanent link">&para;</a></h1>
 <div class="codehilite"><pre>  <span class="o">--</span><span 
class="n">input</span> <span class="p">(</span><span class="o">-</span><span 
class="nb">i</span><span class="p">)</span> <span class="n">input</span>        
           <span class="n">Path</span> <span class="n">to</span> <span 
class="n">job</span> <span class="n">input</span> <span 
class="n">directory</span><span class="p">.</span> 
                            <span class="n">Must</span> <span 
class="n">be</span> <span class="n">a</span> <span 
class="n">SequenceFile</span> <span class="n">of</span>    
                            <span class="n">VectorWritable</span>           

Modified: 
websites/staging/mahout/trunk/content/users/clustering/latent-dirichlet-allocation.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/latent-dirichlet-allocation.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/latent-dirichlet-allocation.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="LatentDirichletAllocation-Overview"></a></p>
-<h1 id="overview">Overview</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="LatentDirichletAllocation-Overview"></a></p>
+<h1 id="overview">Overview<a class="headerlink" href="#overview" 
title="Permanent link">&para;</a></h1>
 <p>Latent Dirichlet Allocation (Blei et al, 2003) is a powerful learning
 algorithm for automatically and jointly clustering words into "topics" and
 documents into mixtures of topics. It has been successfully applied to
@@ -295,7 +307,7 @@ to have come from one of the models in t
 which.  The way we deal with that is to use a so-called latent parameter
 which specifies which model each data point came from.</p>
 <p><a name="LatentDirichletAllocation-CollapsedVariationalBayes"></a></p>
-<h1 id="collapsed-variational-bayes">Collapsed Variational Bayes</h1>
+<h1 id="collapsed-variational-bayes">Collapsed Variational Bayes<a 
class="headerlink" href="#collapsed-variational-bayes" title="Permanent 
link">&para;</a></h1>
 <p>The CVB algorithm which is implemented in Mahout for LDA combines
 advantages of both regular Variational Bayes and Gibbs Sampling.  The
 algorithm relies on modeling dependence of parameters on latest variables
@@ -315,7 +327,7 @@ the order of O(K) with each update to q(
 document/word pair only 1 copy of the variational posterior is required
 over the latent variable.</p>
 <p><a name="LatentDirichletAllocation-InvocationandUsage"></a></p>
-<h1 id="invocation-and-usage">Invocation and Usage</h1>
+<h1 id="invocation-and-usage">Invocation and Usage<a class="headerlink" 
href="#invocation-and-usage" title="Permanent link">&para;</a></h1>
 <p>Mahout's implementation of LDA operates on a collection of SparseVectors of
 word counts. These word counts should be non-negative integers, though
 things will-- probably --work fine if you use non-negative reals. (Note
@@ -360,7 +372,7 @@ LDAPrintTopics utility:</p>
 
 
 <p><a name="LatentDirichletAllocation-Example"></a></p>
-<h1 id="example">Example</h1>
+<h1 id="example">Example<a class="headerlink" href="#example" title="Permanent 
link">&para;</a></h1>
 <p>An example is located in mahout/examples/bin/build-reuters.sh. The script
 automatically downloads the Reuters-21578 corpus, builds a Lucene index and
 converts the Lucene index to vectors. By uncommenting the last two lines
@@ -370,7 +382,7 @@ resultant topics to the console. </p>
 support for Reuters, and that building your own index will require some
 adaptation. The rest should hopefully not differ too much.</p>
 <p><a name="LatentDirichletAllocation-ParameterEstimation"></a></p>
-<h1 id="parameter-estimation">Parameter Estimation</h1>
+<h1 id="parameter-estimation">Parameter Estimation<a class="headerlink" 
href="#parameter-estimation" title="Permanent link">&para;</a></h1>
 <p>We use mean field variational inference to estimate the models. Variational
 inference can be thought of as a generalization of <a 
href="expectation-maximization.html">EM</a>
  for hierarchical Bayesian models. The E-Step takes the form of, for each
@@ -383,7 +395,7 @@ distribution over the entire vocabulary
 executed in the reduce step, with the final normalization happening as a
 post-processing step.</p>
 <p><a name="LatentDirichletAllocation-References"></a></p>
-<h1 id="references">References</h1>
+<h1 id="references">References<a class="headerlink" href="#references" 
title="Permanent link">&para;</a></h1>
 <p><a 
href="-http://machinelearning.wustl.edu/mlpapers/paper_files/BleiNJ03.pdf";>David
 M. Blei, Andrew Y. Ng, Michael I. Jordan, John Lafferty. 2003. Latent 
Dirichlet Allocation. JMLR.</a></p>
 <p><a href="http://psiexp.ss.uci.edu/research/papers/sciencetopics.pdf";>Thomas 
L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. PNAS.  </a></p>
 <p><a href="-http://aclweb.org/anthology//D/D08/D08-1038.pdf";>David Hall, Dan 
Jurafsky, and Christopher D. Manning. 2008. Studying the History of Ideas Using 
Topic Models </a></p>

Modified: 
websites/staging/mahout/trunk/content/users/clustering/lda-commandline.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/clustering/lda-commandline.html 
(original)
+++ websites/staging/mahout/trunk/content/users/clustering/lda-commandline.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a 
name="lda-commandline-RunningLatentDirichletAllocation(algorithm)fromtheCommandLine"></a></p>
-<h1 
id="running-latent-dirichlet-allocation-algorithm-from-the-command-line">Running
 Latent Dirichlet Allocation (algorithm) from the Command Line</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a 
name="lda-commandline-RunningLatentDirichletAllocation(algorithm)fromtheCommandLine"></a></p>
+<h1 
id="running-latent-dirichlet-allocation-algorithm-from-the-command-line">Running
 Latent Dirichlet Allocation (algorithm) from the Command Line<a 
class="headerlink" 
href="#running-latent-dirichlet-allocation-algorithm-from-the-command-line" 
title="Permanent link">&para;</a></h1>
 <p><a href="https://issues.apache.org/jira/browse/MAHOUT-897";>Since Mahout 
v0.6</a>
  lda has been implemented as Collapsed Variable Bayes (cvb). </p>
 <p>Mahout's LDA can be launched from the same command line invocation whether
@@ -285,7 +297,7 @@ the Mahout version number. For example,
 job will be mahout-core-0.3.job</li>
 </ul>
 <p><a name="lda-commandline-Testingitononesinglemachinew/ocluster"></a></p>
-<h2 id="testing-it-on-one-single-machine-wo-cluster">Testing it on one single 
machine w/o cluster</h2>
+<h2 id="testing-it-on-one-single-machine-wo-cluster">Testing it on one single 
machine w/o cluster<a class="headerlink" 
href="#testing-it-on-one-single-machine-wo-cluster" title="Permanent 
link">&para;</a></h2>
 <ul>
 <li>Put the data: cp <PATH TO DATA> testdata</li>
 <li>
@@ -294,7 +306,7 @@ job will be mahout-core-0.3.job</li>
 </li>
 </ul>
 <p><a name="lda-commandline-Runningitonthecluster"></a></p>
-<h2 id="running-it-on-the-cluster">Running it on the cluster</h2>
+<h2 id="running-it-on-the-cluster">Running it on the cluster<a 
class="headerlink" href="#running-it-on-the-cluster" title="Permanent 
link">&para;</a></h2>
 <ul>
 <li>(As needed) Start up Hadoop: $HADOOP_HOME/bin/start-all.sh</li>
 <li>Put the data: $HADOOP_HOME/bin/hadoop fs -put <PATH TO DATA> testdata</li>
@@ -310,7 +322,7 @@ to view all outputs.</p>
 </li>
 </ul>
 <p><a name="lda-commandline-CommandlineoptionsfromMahoutcvbversion0.8"></a></p>
-<h1 id="command-line-options-from-mahout-cvb-version-08">Command line options 
from Mahout cvb version 0.8</h1>
+<h1 id="command-line-options-from-mahout-cvb-version-08">Command line options 
from Mahout cvb version 0.8<a class="headerlink" 
href="#command-line-options-from-mahout-cvb-version-08" title="Permanent 
link">&para;</a></h1>
 <div class="codehilite"><pre><span class="n">mahout</span> <span 
class="n">cvb</span> <span class="o">-</span><span class="n">h</span> 
   <span class="o">--</span><span class="n">input</span> <span 
class="p">(</span><span class="o">-</span><span class="nb">i</span><span 
class="p">)</span> <span class="n">input</span>                      <span 
class="n">Path</span> <span class="n">to</span> <span class="n">job</span> 
<span class="n">input</span> <span class="n">directory</span><span 
class="p">.</span>        
   <span class="o">--</span><span class="n">output</span> <span 
class="p">(</span><span class="o">-</span><span class="n">o</span><span 
class="p">)</span> <span class="n">output</span>                    <span 
class="n">The</span> <span class="n">directory</span> <span 
class="n">pathname</span> <span class="k">for</span> <span 
class="n">output</span><span class="p">.</span>  

Modified: 
websites/staging/mahout/trunk/content/users/clustering/llr---log-likelihood-ratio.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/llr---log-likelihood-ratio.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/llr---log-likelihood-ratio.html
 Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="likelihood-ratio-test">Likelihood ratio test</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="likelihood-ratio-test">Likelihood ratio test<a class="headerlink" 
href="#likelihood-ratio-test" title="Permanent link">&para;</a></h1>
 <p><em>Likelihood ratio test is used to compare the fit of two models one
 of which is nested within the other.</em></p>
 <p>In the context of machine learning and the Mahout project in particular,

Modified: 
websites/staging/mahout/trunk/content/users/clustering/spectral-clustering.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/clustering/spectral-clustering.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/clustering/spectral-clustering.html 
Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a 
href="/users/environment/in-core-reference.html">In-Core Algebraic DSL 
Reference</a></li>
                   <li><a 
href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL 
Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="spectral-clustering-overview">Spectral Clustering Overview</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, 
h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, 
dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="spectral-clustering-overview">Spectral Clustering Overview<a 
class="headerlink" href="#spectral-clustering-overview" title="Permanent 
link">&para;</a></h1>
 <p>Spectral clustering, as its name implies, makes use of the spectrum (or 
eigenvalues) of the similarity matrix of the data. It examines the 
<em>connectedness</em> of the data, whereas other clustering algorithms such as 
k-means use the <em>compactness</em> to assign clusters. Consequently, in 
situations where k-means performs well, spectral clustering will also perform 
well. Additionally, there are situations in which k-means will underperform 
(e.g. concentric circles), but spectral clustering will be able to segment the 
underlying clusters. Spectral clustering is also very useful for image 
segmentation.</p>
 <p>At its simplest, spectral clustering relies on the following four steps:</p>
 <ol>
@@ -281,16 +293,16 @@
 </li>
 </ol>
 <p>For more theoretical background on spectral clustering, such as how 
affinity matrices are computed, the different types of graph Laplacians, and 
whether the top or bottom eigenvectors and eigenvalues are computed, please 
read <a 
href="http://link.springer.com/article/10.1007/s11222-007-9033-z";>Ulrike von 
Luxburg's article in <em>Statistics and Computing</em> from December 2007</a>. 
It provides an excellent description of the linear algebra operations behind 
spectral clustering, and imbues a thorough understanding of the types of 
situations in which it can be used.</p>
-<h1 id="mahout-spectral-clustering">Mahout Spectral Clustering</h1>
+<h1 id="mahout-spectral-clustering">Mahout Spectral Clustering<a 
class="headerlink" href="#mahout-spectral-clustering" title="Permanent 
link">&para;</a></h1>
 <p>As of Mahout 0.3, spectral clustering has been implemented to take 
advantage of the MapReduce framework. It uses <a 
href="http://mahout.apache.org/users/dim-reduction/ssvd.html";>SSVD</a> for 
dimensionality reduction of the input data set, and <a 
href="http://mahout.apache.org/users/clustering/k-means-clustering.html";>k-means</a>
 to perform the final clustering.</p>
 <p><strong>(<a 
href="https://issues.apache.org/jira/browse/MAHOUT-1538";>MAHOUT-1538</a> will 
port the existing Hadoop MapReduce implementation to Mahout DSL, allowing for 
one of several distinct distributed back-ends to conduct the 
computation)</strong></p>
-<h2 id="input">Input</h2>
+<h2 id="input">Input<a class="headerlink" href="#input" title="Permanent 
link">&para;</a></h2>
 <p>The input format for the algorithm currently takes the form of a 
Hadoop-backed affinity matrix in the form of text files. Each line of the text 
file specifies a single element of the affinity matrix: the row index 
<code>\(i\)</code>, the column index <code>\(j\)</code>, and the value:</p>
 <p><code>i, j, value</code></p>
 <p>The affinity matrix is symmetric, and any unspecified <code>\(i, j\)</code> 
pairs are assumed to be 0 for sparsity. The row and column indices are 
0-indexed. Thus, only the non-zero entries of either the upper or lower 
triangular need be specified.</p>
 <p>The matrix elements specified in the text files are collected into a Mahout 
<code>DistributedRowMatrix</code>.</p>
 <p><strong>(<a 
href="https://issues.apache.org/jira/browse/MAHOUT-1539";>MAHOUT-1539</a> will 
allow for the creation of the affinity matrix to occur as part of the core 
spectral clustering algorithm, as opposed to the current requirement that the 
user create this matrix themselves and provide it, rather than the original 
data, to the algorithm)</strong></p>
-<h2 id="running-spectral-clustering">Running spectral clustering</h2>
+<h2 id="running-spectral-clustering">Running spectral clustering<a 
class="headerlink" href="#running-spectral-clustering" title="Permanent 
link">&para;</a></h2>
 <p><strong>(<a 
href="https://issues.apache.org/jira/browse/MAHOUT-1540";>MAHOUT-1540</a> will 
provide a running example of this algorithm and this section will be updated to 
show how to run the example and what the expected output should be; until then, 
this section provides a how-to for simply running the algorithm on arbitrary 
input)</strong></p>
 <p>Spectral clustering can be invoked with the following arguments.</p>
 <div class="codehilite"><pre><span class="n">bin</span><span 
class="o">/</span><span class="n">mahout</span> <span 
class="n">spectralkmeans</span> <span class="o">\</span>
@@ -303,7 +315,7 @@
 
 
 <p>The affinity matrix can be contained in a single text file (using the 
aforementioned one-line-per-entry format) or span many text files <a 
href="https://issues.apache.org/jira/browse/MAHOUT-978";>per (MAHOUT-978</a>, do 
not prefix text files with a leading underscore '_' or period '.'). The 
<code>-d</code> flag is required for the algorithm to know the dimensions of 
the affinity matrix. <code>-k</code> is the number of top eigenvectors from the 
normalized graph Laplacian in the SSVD step, and also the number of clusters 
given to k-means after the SSVD step.</p>
-<h2 id="example">Example</h2>
+<h2 id="example">Example<a class="headerlink" href="#example" title="Permanent 
link">&para;</a></h2>
 <p>To provide a simple example, take the following affinity matrix, contained 
in a text file called <code>affinity.txt</code>:</p>
 <div class="codehilite"><pre>0<span class="p">,</span> 0<span 
class="p">,</span> 0
 0<span class="p">,</span> 1<span class="p">,</span> 0<span class="p">.</span>8

svn commit: r985117 [4/6] - in /websites/staging/mahout/trunk/content: ./ developers/ general/ images/ users/algorithms/ users/basics/ users/classification/ users/clustering/ users/dim-reduction/ users/environment/ users/flinkbindings/ users/misc/ user...

Reply via email to