Added: 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/clustering-of-synthetic-control-data.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/clustering-of-synthetic-control-data.html
 (added)
+++ 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/clustering-of-synthetic-control-data.html
 Thu Mar 19 21:21:45 2015
@@ -0,0 +1,318 @@
+<!DOCTYPE html>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en"><head><meta 
http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <title>Apache Mahout: Scalable machine learning and data mining</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="Distribution" content="Global">
+  <meta name="Robots" content="index,follow">
+  <meta name="keywords" content="apache, apache hadoop, apache lucene,
+        business data mining, cluster analysis,
+        collaborative filtering, data extraction, data filtering, data 
framework, data integration,
+        data matching, data mining, data mining algorithms, data mining 
analysis, data mining data,
+        data mining introduction, data mining software,
+        data mining techniques, data representation, data set, datamining,
+        feature extraction, fuzzy k means, genetic algorithm, hadoop,
+        hierarchical clustering, high dimensional, introduction to data 
mining, kmeans,
+        knowledge discovery, learning approach, learning approaches, learning 
methods,
+        learning techniques, lucene, machine learning, machine translation, 
mahout apache,
+        mahout taste, map reduce hadoop, mining data, mining methods, naive 
bayes,
+        natural language processing,
+        supervised, text mining, time series data, unsupervised, web data 
mining">
+  <link rel="shortcut icon" type="image/x-icon" 
href="http://mahout.apache.org/images/favicon.ico";>
+  <script type="text/javascript" src="/js/prototype.js"></script>
+  <script type="text/javascript" src="/js/effects.js"></script>
+  <script type="text/javascript" src="/js/search.js"></script>
+  <script type="text/javascript" src="/js/slides.js"></script>
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-responsive.css" rel="stylesheet">
+  <link rel="stylesheet" href="/css/global.css" type="text/css">
+
+  <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown -->
+  <script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    tex2jax: {
+      skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+    }
+  });
+  MathJax.Hub.Queue(function() {
+    var all = MathJax.Hub.getAllJax(), i;
+    for(i = 0; i < all.length; i += 1) {
+      all[i].SourceElement().parentNode.className += ' has-jax';
+    }
+  });
+  </script>
+  <script type="text/javascript">
+    var mathjax = document.createElement('script'); 
+    mathjax.type = 'text/javascript'; 
+    mathjax.async = true;
+
+    mathjax.src = ('https:' == document.location.protocol) ?
+        
'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
 : 
+        
'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+       
+         var s = document.getElementsByTagName('script')[0]; 
+    s.parentNode.insertBefore(mathjax, s);
+  </script>
+</head>
+
+<body id="home" data-twttr-rendered="true">
+  <div id="wrap">
+   <div id="header">
+    <div id="logo"><a href="/overview.html"></a></div>
+  <div id="search">
+    <form id="search-form" action="http://www.google.com/search"; method="get" 
class="navbar-search pull-right">    
+      <input value="http://mahout.apache.org"; name="sitesearch" type="hidden">
+      <input class="search-query" name="q" id="query" type="text">
+      <input id="submission" type="image" src="/images/mahout-lupe.png" 
alt="Search" />
+    </form>
+  </div>
+
+    <div class="navbar navbar-inverse" 
style="position:absolute;top:133px;padding-right:0px;padding-left:0px;">
+      <div class="navbar-inner" style="border: none; background: #999; border: 
none; border-radius: 0px;">
+        <div class="container">
+          <button type="button" class="btn btn-navbar" data-toggle="collapse" 
data-target=".nav-collapse">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <!-- <a class="brand" href="#">Apache Community Development 
Project</a> -->
+          <div class="nav-collapse collapse">
+            <ul class="nav">
+              <li><a href="/">Home</a></li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">General<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/general/downloads.html">Downloads</a>
+                  <li><a href="/general/who-we-are.html">Who we are</a>
+                  <li><a 
href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a>
+                  <li><a href="/general/release-notes.html">Release Notes</a> 
+                  <li><a href="/general/books-tutorials-and-talks.html">Books, 
Tutorials, Talks</a></li>
+                  <li><a href="/general/powered-by-mahout.html">Powered By 
Mahout</a>
+                  <li><a 
href="/general/professional-support.html">Professional Support</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Resources</li>
+                  <li><a href="/general/reference-reading.html">Reference 
Reading</a>
+                  <li><a href="/general/faq.html">FAQ</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Legal</li>
+                  <li><a 
href="http://www.apache.org/licenses/";>License</a></li>
+                  <li><a 
href="http://www.apache.org/security/";>Security</a></li>
+                  <li><a href="/general/privacy-policy.html">Privacy Policy</a>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Developers<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/developers/developer-resources.html">Developer 
resources</a></li>
+                  <li><a href="/developers/version-control.html">Version 
control</a></li>
+                  <li><a href="/developers/buildingmahout.html">Build from 
source</a></li>
+                  <li><a href="/developers/issue-tracker.html">Issue 
tracker</a></li>
+                  <li><a href="https://builds.apache.org/job/Mahout-Quality/"; 
target="_blank">Code quality reports</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Contributions</li>
+                  <li><a href="/developers/how-to-contribute.html">How to 
contribute</a></li>
+                  <li><a href="/developers/how-to-become-a-committer.html">How 
to become a committer</a></li>
+                  <li><a href="/developers/gsoc.html">GSoC</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">For committers</li>
+                  <li><a href="/developers/how-to-update-the-website.html">How 
to update the website</a></li>
+                  <li><a href="/developers/patch-check-list.html">Patch check 
list</a></li>
+                  <li><a href="/developers/github.html">Handling Github 
PRs</a></li>
+                  <li><a href="/developers/how-to-release.html">How to 
release</a></li>
+                  <li><a href="/developers/thirdparty-dependencies.html">Third 
party dependencies</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Basics<b class="caret"></b></a>
+                 <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of 
algorithms</a>
+                  <li><a href="/users/basics/quickstart.html">Quickstart</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Working with text</li>
+                  <li><a 
href="/users/basics/creating-vectors-from-text.html">Creating vectors from 
text</a>
+                  <li><a 
href="/users/basics/collocations.html">Collocations</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Dimensionality reduction</li>
+                  <li><a 
href="/users/dim-reduction/dimensional-reduction.html">Singular Value 
Decomposition</a></li>
+                  <li><a href="/users/dim-reduction/ssvd.html">Stochastic 
SVD</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Topic Models</li>      
+                  <li><a 
href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet 
Allocation</a></li>
+                </ul>
+                 </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Spark<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/sparkbindings/home.html">Scala &amp; 
Spark Bindings Overview</a></li>
+                  <li><a 
href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark 
Shell</a></li>
+                             <li class="divider"></li>
+                  <li><a href="/users/sparkbindings/faq.html">FAQ</a></li>
+                </ul>
+               </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Classification<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a 
href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li>
+                  <li><a 
href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov 
Models</a></li>
+                  <li><a 
href="/users/mapreduce/classification/logistic-regression.html">Logistic 
Regression</a></li>
+                  <li><a 
href="/users/mapreduce/classification/partial-implementation.html">Random 
Forest</a></li>
+
+                  <li class="divider"></li>
+                  <li class="nav-header">Examples</li>
+                  <li><a 
href="/users/mapreduce/classification/breiman-example.html">Breiman 
example</a></li>
+                  <li><a 
href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups 
example</a></li>
+                </ul></li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Clustering<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/streaming-k-means.html">Streaming 
KMeans</a></li>
+                <li><a 
href="/users/mapreduce/clustering/spectral-clustering.html">Spectral 
Clustering</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Commandline usage</li>
+                <li><a 
href="/users/mapreduce/clustering/k-means-commandline.html">Options for 
k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-commandline.html">Options for 
Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for 
Fuzzy k-Means</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Examples</li>
+                <li><a 
href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic
 data</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Post processing</li>
+                <li><a 
href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper 
tool</a></li>
+                <li><a 
href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster 
visualisation</a></li>
+                </ul></li>
+                <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Recommendations<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First 
Timer FAQ</a></li>
+                <li><a 
href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based 
recommender <br/>in 5 minutes</a></li>
+               <li><a 
href="/users/mapreduce/recommender/matrix-factorization.html">Matrix 
factorization-based<br/> recommenders</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Hadoop</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to 
item-based recommendations<br/> with Hadoop</a></li>
+                <li><a 
href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS 
recommendations<br/> with Hadoop</a></li>
+                <li class="nav-header">Spark</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to 
cooccurrence-based<br/> recommendations with Spark</a></li>
+              </ul>
+            </li>
+           </ul>
+          </div><!--/.nav-collapse -->
+        </div>
+      </div>
+    </div>
+
+</div>
+
+ <div id="sidebar">
+  <div id="sidebar-wrap">
+    <h2>Twitter</h2>
+       <ul class="sidemenu">
+               <li>
+<a class="twitter-timeline" href="https://twitter.com/ApacheMahout"; 
data-widget-id="422861673444028416">Tweets by @ApacheMahout</a>
+<script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
+</li>
+       </ul>
+    <h2>Apache Software Foundation</h2>
+    <ul class="sidemenu">
+      <li><a href="http://www.apache.org/foundation/how-it-works.html";>How the 
ASF works</a></li>
+      <li><a href="http://www.apache.org/foundation/getinvolved.html";>Get 
Involved</a></li>
+      <li><a href="http://www.apache.org/dev/";>Developer Resources</a></li>
+      <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+      <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+    </ul>
+    <h2>Related Projects</h2>
+    <ul class="sidemenu">
+      <li><a href="http://lucene.apache.org/";>Lucene</a></li>
+      <li><a href="http://hadoop.apache.org/";>Hadoop</a></li>
+    </ul>
+  </div>
+</div>
+
+  <div id="content-wrap" class="clearfix">
+   <div id="main">
+    <h1 id="clustering-synthetic-control-data">Clustering synthetic control 
data</h1>
+<h2 id="introduction">Introduction</h2>
+<p>This example will demonstrate clustering of time series data, specifically 
control charts. <a href="http://en.wikipedia.org/wiki/Control_chart";>Control 
charts</a> are tools used to determine whether a manufacturing or business 
process is in a state of statistical control. Such control charts are generated 
/ simulated repeatedly at equal time intervals. A <a 
href="http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data.html";>simulated
 dataset</a> is available for use in UCI machine learning repository.</p>
+<p>A time series of control charts needs to be clustered into their close knit 
groups. The data set we use is synthetic and is meant to resemble real world 
information in an anonymized format. It contains six different classes: Normal, 
Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift. In 
this example we will use Mahout to cluster the data into corresponding class 
buckets. </p>
+<p><em>For the sake of simplicity, we won't use a cluster in this example, but 
instead show you the commands to run the clustering examples locally with 
Hadoop</em>.</p>
+<h2 id="setup">Setup</h2>
+<p>We need to do some initial setup before we are able to run the example. </p>
+<ol>
+<li>
+<p>Start out by downloading the dataset to be clustered from the UCI Machine 
Learning Repository: <a 
href="http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data";>http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data</a>.</p>
+</li>
+<li>
+<p>Download the <a href="/general/downloads.html">latest release of 
Mahout</a>.</p>
+</li>
+<li>
+<p>Unpack the release binary and switch to the 
<em>mahout-distribution-0.x</em> folder</p>
+</li>
+<li>
+<p>Make sure that the <em>JAVA_HOME</em> environment variable points to your 
local java installation</p>
+</li>
+<li>
+<p>Create a folder called <em>testdata</em> in the current directory and copy 
the dataset into this folder.</p>
+</li>
+</ol>
+<h2 id="clustering-examples">Clustering Examples</h2>
+<p>Depending on the clustering algorithm you want to run, the following 
commands can be used:</p>
+<ul>
+<li>
+<p><a href="/users/clustering/canopy-clustering.html">Canopy Clustering</a></p>
+<p>bin/mahout org.apache.mahout.clustering.syntheticcontrol.canopy.Job</p>
+</li>
+<li>
+<p><a href="/users/clustering/k-means-clustering.html">k-Means 
Clustering</a></p>
+<p>bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job</p>
+</li>
+<li>
+<p><a href="/users/clustering/fuzzy-k-means.html">Fuzzy k-Means 
Clustering</a></p>
+<p>bin/mahout org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job</p>
+</li>
+</ul>
+<p>The clustering output will be produced in the <em>output</em> directory. 
The output data points are in vector format. In order to read/analyze the 
output, you can use the <a 
href="/users/clustering/cluster-dumper.html">clusterdump</a> utility provided 
by Mahout.</p>
+   </div>
+  </div>     
+</div> 
+  <footer class="footer" align="center">
+    <div class="container">
+      <p>
+        Copyright &copy; 2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache and the Apache feather logos are trademarks of The Apache 
Software Foundation.
+      </p>
+    </div>
+  </footer>
+  
+  <script src="/js/jquery-1.9.1.min.js"></script>
+  <script src="/js/bootstrap.min.js"></script>
+  <script>
+    (function() {
+      var cx = '012254517474945470291:vhsfv7eokdc';
+      var gcse = document.createElement('script');
+      gcse.type = 'text/javascript';
+      gcse.async = true;
+      gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') 
+
+          '//www.google.com/cse/cse.js?cx=' + cx;
+      var s = document.getElementsByTagName('script')[0];
+      s.parentNode.insertBefore(gcse, s);
+    })();
+  </script>
+</body>
+</html>

Added: 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/clustering-seinfeld-episodes.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/clustering-seinfeld-episodes.html
 (added)
+++ 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/clustering-seinfeld-episodes.html
 Thu Mar 19 21:21:45 2015
@@ -0,0 +1,280 @@
+<!DOCTYPE html>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en"><head><meta 
http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <title>Apache Mahout: Scalable machine learning and data mining</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="Distribution" content="Global">
+  <meta name="Robots" content="index,follow">
+  <meta name="keywords" content="apache, apache hadoop, apache lucene,
+        business data mining, cluster analysis,
+        collaborative filtering, data extraction, data filtering, data 
framework, data integration,
+        data matching, data mining, data mining algorithms, data mining 
analysis, data mining data,
+        data mining introduction, data mining software,
+        data mining techniques, data representation, data set, datamining,
+        feature extraction, fuzzy k means, genetic algorithm, hadoop,
+        hierarchical clustering, high dimensional, introduction to data 
mining, kmeans,
+        knowledge discovery, learning approach, learning approaches, learning 
methods,
+        learning techniques, lucene, machine learning, machine translation, 
mahout apache,
+        mahout taste, map reduce hadoop, mining data, mining methods, naive 
bayes,
+        natural language processing,
+        supervised, text mining, time series data, unsupervised, web data 
mining">
+  <link rel="shortcut icon" type="image/x-icon" 
href="http://mahout.apache.org/images/favicon.ico";>
+  <script type="text/javascript" src="/js/prototype.js"></script>
+  <script type="text/javascript" src="/js/effects.js"></script>
+  <script type="text/javascript" src="/js/search.js"></script>
+  <script type="text/javascript" src="/js/slides.js"></script>
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-responsive.css" rel="stylesheet">
+  <link rel="stylesheet" href="/css/global.css" type="text/css">
+
+  <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown -->
+  <script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    tex2jax: {
+      skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+    }
+  });
+  MathJax.Hub.Queue(function() {
+    var all = MathJax.Hub.getAllJax(), i;
+    for(i = 0; i < all.length; i += 1) {
+      all[i].SourceElement().parentNode.className += ' has-jax';
+    }
+  });
+  </script>
+  <script type="text/javascript">
+    var mathjax = document.createElement('script'); 
+    mathjax.type = 'text/javascript'; 
+    mathjax.async = true;
+
+    mathjax.src = ('https:' == document.location.protocol) ?
+        
'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
 : 
+        
'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+       
+         var s = document.getElementsByTagName('script')[0]; 
+    s.parentNode.insertBefore(mathjax, s);
+  </script>
+</head>
+
+<body id="home" data-twttr-rendered="true">
+  <div id="wrap">
+   <div id="header">
+    <div id="logo"><a href="/overview.html"></a></div>
+  <div id="search">
+    <form id="search-form" action="http://www.google.com/search"; method="get" 
class="navbar-search pull-right">    
+      <input value="http://mahout.apache.org"; name="sitesearch" type="hidden">
+      <input class="search-query" name="q" id="query" type="text">
+      <input id="submission" type="image" src="/images/mahout-lupe.png" 
alt="Search" />
+    </form>
+  </div>
+
+    <div class="navbar navbar-inverse" 
style="position:absolute;top:133px;padding-right:0px;padding-left:0px;">
+      <div class="navbar-inner" style="border: none; background: #999; border: 
none; border-radius: 0px;">
+        <div class="container">
+          <button type="button" class="btn btn-navbar" data-toggle="collapse" 
data-target=".nav-collapse">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <!-- <a class="brand" href="#">Apache Community Development 
Project</a> -->
+          <div class="nav-collapse collapse">
+            <ul class="nav">
+              <li><a href="/">Home</a></li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">General<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/general/downloads.html">Downloads</a>
+                  <li><a href="/general/who-we-are.html">Who we are</a>
+                  <li><a 
href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a>
+                  <li><a href="/general/release-notes.html">Release Notes</a> 
+                  <li><a href="/general/books-tutorials-and-talks.html">Books, 
Tutorials, Talks</a></li>
+                  <li><a href="/general/powered-by-mahout.html">Powered By 
Mahout</a>
+                  <li><a 
href="/general/professional-support.html">Professional Support</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Resources</li>
+                  <li><a href="/general/reference-reading.html">Reference 
Reading</a>
+                  <li><a href="/general/faq.html">FAQ</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Legal</li>
+                  <li><a 
href="http://www.apache.org/licenses/";>License</a></li>
+                  <li><a 
href="http://www.apache.org/security/";>Security</a></li>
+                  <li><a href="/general/privacy-policy.html">Privacy Policy</a>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Developers<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/developers/developer-resources.html">Developer 
resources</a></li>
+                  <li><a href="/developers/version-control.html">Version 
control</a></li>
+                  <li><a href="/developers/buildingmahout.html">Build from 
source</a></li>
+                  <li><a href="/developers/issue-tracker.html">Issue 
tracker</a></li>
+                  <li><a href="https://builds.apache.org/job/Mahout-Quality/"; 
target="_blank">Code quality reports</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Contributions</li>
+                  <li><a href="/developers/how-to-contribute.html">How to 
contribute</a></li>
+                  <li><a href="/developers/how-to-become-a-committer.html">How 
to become a committer</a></li>
+                  <li><a href="/developers/gsoc.html">GSoC</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">For committers</li>
+                  <li><a href="/developers/how-to-update-the-website.html">How 
to update the website</a></li>
+                  <li><a href="/developers/patch-check-list.html">Patch check 
list</a></li>
+                  <li><a href="/developers/github.html">Handling Github 
PRs</a></li>
+                  <li><a href="/developers/how-to-release.html">How to 
release</a></li>
+                  <li><a href="/developers/thirdparty-dependencies.html">Third 
party dependencies</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Basics<b class="caret"></b></a>
+                 <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of 
algorithms</a>
+                  <li><a href="/users/basics/quickstart.html">Quickstart</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Working with text</li>
+                  <li><a 
href="/users/basics/creating-vectors-from-text.html">Creating vectors from 
text</a>
+                  <li><a 
href="/users/basics/collocations.html">Collocations</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Dimensionality reduction</li>
+                  <li><a 
href="/users/dim-reduction/dimensional-reduction.html">Singular Value 
Decomposition</a></li>
+                  <li><a href="/users/dim-reduction/ssvd.html">Stochastic 
SVD</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Topic Models</li>      
+                  <li><a 
href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet 
Allocation</a></li>
+                </ul>
+                 </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Spark<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/sparkbindings/home.html">Scala &amp; 
Spark Bindings Overview</a></li>
+                  <li><a 
href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark 
Shell</a></li>
+                             <li class="divider"></li>
+                  <li><a href="/users/sparkbindings/faq.html">FAQ</a></li>
+                </ul>
+               </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Classification<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a 
href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li>
+                  <li><a 
href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov 
Models</a></li>
+                  <li><a 
href="/users/mapreduce/classification/logistic-regression.html">Logistic 
Regression</a></li>
+                  <li><a 
href="/users/mapreduce/classification/partial-implementation.html">Random 
Forest</a></li>
+
+                  <li class="divider"></li>
+                  <li class="nav-header">Examples</li>
+                  <li><a 
href="/users/mapreduce/classification/breiman-example.html">Breiman 
example</a></li>
+                  <li><a 
href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups 
example</a></li>
+                </ul></li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Clustering<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/streaming-k-means.html">Streaming 
KMeans</a></li>
+                <li><a 
href="/users/mapreduce/clustering/spectral-clustering.html">Spectral 
Clustering</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Commandline usage</li>
+                <li><a 
href="/users/mapreduce/clustering/k-means-commandline.html">Options for 
k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-commandline.html">Options for 
Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for 
Fuzzy k-Means</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Examples</li>
+                <li><a 
href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic
 data</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Post processing</li>
+                <li><a 
href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper 
tool</a></li>
+                <li><a 
href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster 
visualisation</a></li>
+                </ul></li>
+                <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Recommendations<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First 
Timer FAQ</a></li>
+                <li><a 
href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based 
recommender <br/>in 5 minutes</a></li>
+               <li><a 
href="/users/mapreduce/recommender/matrix-factorization.html">Matrix 
factorization-based<br/> recommenders</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Hadoop</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to 
item-based recommendations<br/> with Hadoop</a></li>
+                <li><a 
href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS 
recommendations<br/> with Hadoop</a></li>
+                <li class="nav-header">Spark</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to 
cooccurrence-based<br/> recommendations with Spark</a></li>
+              </ul>
+            </li>
+           </ul>
+          </div><!--/.nav-collapse -->
+        </div>
+      </div>
+    </div>
+
+</div>
+
+ <div id="sidebar">
+  <div id="sidebar-wrap">
+    <h2>Twitter</h2>
+       <ul class="sidemenu">
+               <li>
+<a class="twitter-timeline" href="https://twitter.com/ApacheMahout"; 
data-widget-id="422861673444028416">Tweets by @ApacheMahout</a>
+<script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
+</li>
+       </ul>
+    <h2>Apache Software Foundation</h2>
+    <ul class="sidemenu">
+      <li><a href="http://www.apache.org/foundation/how-it-works.html";>How the 
ASF works</a></li>
+      <li><a href="http://www.apache.org/foundation/getinvolved.html";>Get 
Involved</a></li>
+      <li><a href="http://www.apache.org/dev/";>Developer Resources</a></li>
+      <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+      <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+    </ul>
+    <h2>Related Projects</h2>
+    <ul class="sidemenu">
+      <li><a href="http://lucene.apache.org/";>Lucene</a></li>
+      <li><a href="http://hadoop.apache.org/";>Hadoop</a></li>
+    </ul>
+  </div>
+</div>
+
+  <div id="content-wrap" class="clearfix">
+   <div id="main">
+    <p>Below is short tutorial on how to cluster Seinfeld episode transcripts 
with
+Mahout.</p>
+<p>http://blog.jteam.nl/2011/04/04/how-to-cluster-seinfeld-episodes-with-mahout/</p>
+   </div>
+  </div>     
+</div> 
+  <footer class="footer" align="center">
+    <div class="container">
+      <p>
+        Copyright &copy; 2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache and the Apache feather logos are trademarks of The Apache 
Software Foundation.
+      </p>
+    </div>
+  </footer>
+  
+  <script src="/js/jquery-1.9.1.min.js"></script>
+  <script src="/js/bootstrap.min.js"></script>
+  <script>
+    (function() {
+      var cx = '012254517474945470291:vhsfv7eokdc';
+      var gcse = document.createElement('script');
+      gcse.type = 'text/javascript';
+      gcse.async = true;
+      gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') 
+
+          '//www.google.com/cse/cse.js?cx=' + cx;
+      var s = document.getElementsByTagName('script')[0];
+      s.parentNode.insertBefore(gcse, s);
+    })();
+  </script>
+</body>
+</html>

Added: 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/clusteringyourdata.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/clusteringyourdata.html
 (added)
+++ 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/clusteringyourdata.html
 Thu Mar 19 21:21:45 2015
@@ -0,0 +1,389 @@
+<!DOCTYPE html>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en"><head><meta 
http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <title>Apache Mahout: Scalable machine learning and data mining</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="Distribution" content="Global">
+  <meta name="Robots" content="index,follow">
+  <meta name="keywords" content="apache, apache hadoop, apache lucene,
+        business data mining, cluster analysis,
+        collaborative filtering, data extraction, data filtering, data 
framework, data integration,
+        data matching, data mining, data mining algorithms, data mining 
analysis, data mining data,
+        data mining introduction, data mining software,
+        data mining techniques, data representation, data set, datamining,
+        feature extraction, fuzzy k means, genetic algorithm, hadoop,
+        hierarchical clustering, high dimensional, introduction to data 
mining, kmeans,
+        knowledge discovery, learning approach, learning approaches, learning 
methods,
+        learning techniques, lucene, machine learning, machine translation, 
mahout apache,
+        mahout taste, map reduce hadoop, mining data, mining methods, naive 
bayes,
+        natural language processing,
+        supervised, text mining, time series data, unsupervised, web data 
mining">
+  <link rel="shortcut icon" type="image/x-icon" 
href="http://mahout.apache.org/images/favicon.ico";>
+  <script type="text/javascript" src="/js/prototype.js"></script>
+  <script type="text/javascript" src="/js/effects.js"></script>
+  <script type="text/javascript" src="/js/search.js"></script>
+  <script type="text/javascript" src="/js/slides.js"></script>
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-responsive.css" rel="stylesheet">
+  <link rel="stylesheet" href="/css/global.css" type="text/css">
+
+  <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown -->
+  <script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    tex2jax: {
+      skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+    }
+  });
+  MathJax.Hub.Queue(function() {
+    var all = MathJax.Hub.getAllJax(), i;
+    for(i = 0; i < all.length; i += 1) {
+      all[i].SourceElement().parentNode.className += ' has-jax';
+    }
+  });
+  </script>
+  <script type="text/javascript">
+    var mathjax = document.createElement('script'); 
+    mathjax.type = 'text/javascript'; 
+    mathjax.async = true;
+
+    mathjax.src = ('https:' == document.location.protocol) ?
+        
'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
 : 
+        
'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+       
+         var s = document.getElementsByTagName('script')[0]; 
+    s.parentNode.insertBefore(mathjax, s);
+  </script>
+</head>
+
+<body id="home" data-twttr-rendered="true">
+  <div id="wrap">
+   <div id="header">
+    <div id="logo"><a href="/overview.html"></a></div>
+  <div id="search">
+    <form id="search-form" action="http://www.google.com/search"; method="get" 
class="navbar-search pull-right">    
+      <input value="http://mahout.apache.org"; name="sitesearch" type="hidden">
+      <input class="search-query" name="q" id="query" type="text">
+      <input id="submission" type="image" src="/images/mahout-lupe.png" 
alt="Search" />
+    </form>
+  </div>
+
+    <div class="navbar navbar-inverse" 
style="position:absolute;top:133px;padding-right:0px;padding-left:0px;">
+      <div class="navbar-inner" style="border: none; background: #999; border: 
none; border-radius: 0px;">
+        <div class="container">
+          <button type="button" class="btn btn-navbar" data-toggle="collapse" 
data-target=".nav-collapse">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <!-- <a class="brand" href="#">Apache Community Development 
Project</a> -->
+          <div class="nav-collapse collapse">
+            <ul class="nav">
+              <li><a href="/">Home</a></li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">General<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/general/downloads.html">Downloads</a>
+                  <li><a href="/general/who-we-are.html">Who we are</a>
+                  <li><a 
href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a>
+                  <li><a href="/general/release-notes.html">Release Notes</a> 
+                  <li><a href="/general/books-tutorials-and-talks.html">Books, 
Tutorials, Talks</a></li>
+                  <li><a href="/general/powered-by-mahout.html">Powered By 
Mahout</a>
+                  <li><a 
href="/general/professional-support.html">Professional Support</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Resources</li>
+                  <li><a href="/general/reference-reading.html">Reference 
Reading</a>
+                  <li><a href="/general/faq.html">FAQ</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Legal</li>
+                  <li><a 
href="http://www.apache.org/licenses/";>License</a></li>
+                  <li><a 
href="http://www.apache.org/security/";>Security</a></li>
+                  <li><a href="/general/privacy-policy.html">Privacy Policy</a>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Developers<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/developers/developer-resources.html">Developer 
resources</a></li>
+                  <li><a href="/developers/version-control.html">Version 
control</a></li>
+                  <li><a href="/developers/buildingmahout.html">Build from 
source</a></li>
+                  <li><a href="/developers/issue-tracker.html">Issue 
tracker</a></li>
+                  <li><a href="https://builds.apache.org/job/Mahout-Quality/"; 
target="_blank">Code quality reports</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Contributions</li>
+                  <li><a href="/developers/how-to-contribute.html">How to 
contribute</a></li>
+                  <li><a href="/developers/how-to-become-a-committer.html">How 
to become a committer</a></li>
+                  <li><a href="/developers/gsoc.html">GSoC</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">For committers</li>
+                  <li><a href="/developers/how-to-update-the-website.html">How 
to update the website</a></li>
+                  <li><a href="/developers/patch-check-list.html">Patch check 
list</a></li>
+                  <li><a href="/developers/github.html">Handling Github 
PRs</a></li>
+                  <li><a href="/developers/how-to-release.html">How to 
release</a></li>
+                  <li><a href="/developers/thirdparty-dependencies.html">Third 
party dependencies</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Basics<b class="caret"></b></a>
+                 <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of 
algorithms</a>
+                  <li><a href="/users/basics/quickstart.html">Quickstart</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Working with text</li>
+                  <li><a 
href="/users/basics/creating-vectors-from-text.html">Creating vectors from 
text</a>
+                  <li><a 
href="/users/basics/collocations.html">Collocations</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Dimensionality reduction</li>
+                  <li><a 
href="/users/dim-reduction/dimensional-reduction.html">Singular Value 
Decomposition</a></li>
+                  <li><a href="/users/dim-reduction/ssvd.html">Stochastic 
SVD</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Topic Models</li>      
+                  <li><a 
href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet 
Allocation</a></li>
+                </ul>
+                 </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Spark<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/sparkbindings/home.html">Scala &amp; 
Spark Bindings Overview</a></li>
+                  <li><a 
href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark 
Shell</a></li>
+                             <li class="divider"></li>
+                  <li><a href="/users/sparkbindings/faq.html">FAQ</a></li>
+                </ul>
+               </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Classification<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a 
href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li>
+                  <li><a 
href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov 
Models</a></li>
+                  <li><a 
href="/users/mapreduce/classification/logistic-regression.html">Logistic 
Regression</a></li>
+                  <li><a 
href="/users/mapreduce/classification/partial-implementation.html">Random 
Forest</a></li>
+
+                  <li class="divider"></li>
+                  <li class="nav-header">Examples</li>
+                  <li><a 
href="/users/mapreduce/classification/breiman-example.html">Breiman 
example</a></li>
+                  <li><a 
href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups 
example</a></li>
+                </ul></li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Clustering<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/streaming-k-means.html">Streaming 
KMeans</a></li>
+                <li><a 
href="/users/mapreduce/clustering/spectral-clustering.html">Spectral 
Clustering</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Commandline usage</li>
+                <li><a 
href="/users/mapreduce/clustering/k-means-commandline.html">Options for 
k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-commandline.html">Options for 
Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for 
Fuzzy k-Means</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Examples</li>
+                <li><a 
href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic
 data</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Post processing</li>
+                <li><a 
href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper 
tool</a></li>
+                <li><a 
href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster 
visualisation</a></li>
+                </ul></li>
+                <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Recommendations<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First 
Timer FAQ</a></li>
+                <li><a 
href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based 
recommender <br/>in 5 minutes</a></li>
+               <li><a 
href="/users/mapreduce/recommender/matrix-factorization.html">Matrix 
factorization-based<br/> recommenders</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Hadoop</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to 
item-based recommendations<br/> with Hadoop</a></li>
+                <li><a 
href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS 
recommendations<br/> with Hadoop</a></li>
+                <li class="nav-header">Spark</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to 
cooccurrence-based<br/> recommendations with Spark</a></li>
+              </ul>
+            </li>
+           </ul>
+          </div><!--/.nav-collapse -->
+        </div>
+      </div>
+    </div>
+
+</div>
+
+ <div id="sidebar">
+  <div id="sidebar-wrap">
+    <h2>Twitter</h2>
+       <ul class="sidemenu">
+               <li>
+<a class="twitter-timeline" href="https://twitter.com/ApacheMahout"; 
data-widget-id="422861673444028416">Tweets by @ApacheMahout</a>
+<script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
+</li>
+       </ul>
+    <h2>Apache Software Foundation</h2>
+    <ul class="sidemenu">
+      <li><a href="http://www.apache.org/foundation/how-it-works.html";>How the 
ASF works</a></li>
+      <li><a href="http://www.apache.org/foundation/getinvolved.html";>Get 
Involved</a></li>
+      <li><a href="http://www.apache.org/dev/";>Developer Resources</a></li>
+      <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+      <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+    </ul>
+    <h2>Related Projects</h2>
+    <ul class="sidemenu">
+      <li><a href="http://lucene.apache.org/";>Lucene</a></li>
+      <li><a href="http://hadoop.apache.org/";>Hadoop</a></li>
+    </ul>
+  </div>
+</div>
+
+  <div id="content-wrap" class="clearfix">
+   <div id="main">
+    <h1 id="clustering-your-data">Clustering your data</h1>
+<p>After you've done the <a href="quickstart.html">Quickstart</a> and are 
familiar with the basics of Mahout, it is time to cluster your own
+data. See also <a href="en.wikipedia.org/wiki/Cluster_analysis">Wikipedia on 
cluster analysis</a> for more background.</p>
+<p>The following pieces <em>may</em> be useful for in getting started:</p>
+<p><a name="ClusteringYourData-Input"></a></p>
+<h1 id="input">Input</h1>
+<p>For starters, you will need your data in an appropriate Vector format, see 
<a href="../basics/creating-vectors.html">Creating Vectors</a>.
+In particular for text preparation check out <a 
href="../basics/creating-vectors-from-text.html">Creating Vectors from 
Text</a>.</p>
+<p><a name="ClusteringYourData-RunningtheProcess"></a></p>
+<h1 id="running-the-process">Running the Process</h1>
+<ul>
+<li>
+<p><a href="canopy-clustering.html">Canopy background</a> and <a 
href="canopy-commandline.html">canopy-commandline</a>.</p>
+</li>
+<li>
+<p><a href="k-means-clustering.html">K-Means background</a>, <a 
href="k-means-commandline.html">k-means-commandline</a>, and
+<a href="fuzzy-k-means-commandline.html">fuzzy-k-means-commandline</a>.</p>
+</li>
+<li>
+<p><a href="dirichlet-process-clustering.html">Dirichlet background</a> and <a 
href="dirichlet-commandline.html">dirichlet-commandline</a>.</p>
+</li>
+<li>
+<p><a href="mean-shift-clustering.html">Meanshift background</a> and <a 
href="mean-shift-commandline.html">mean-shift-commandline</a>.</p>
+</li>
+<li>
+<p><a href="-latent-dirichlet-allocation.html">LDA (Latent Dirichlet 
Allocation) background</a> and <a 
href="lda-commandline.html">lda-commandline</a>.</p>
+</li>
+<li>
+<p>TODO: kmeans++/ streaming kMeans documentation</p>
+</li>
+</ul>
+<p><a name="ClusteringYourData-RetrievingtheOutput"></a></p>
+<h1 id="retrieving-the-output">Retrieving the Output</h1>
+<p>Mahout has a cluster dumper utility that can be used to retrieve and 
evaluate your clustering data.</p>
+<div class="codehilite"><pre><span class="o">./</span><span 
class="n">bin</span><span class="o">/</span><span class="n">mahout</span> <span 
class="n">clusterdump</span> <span class="o">&lt;</span><span 
class="n">OPTIONS</span><span class="o">&gt;</span>
+</pre></div>
+
+
+<p><a name="ClusteringYourData-Theclusterdumperoptionsare:"></a></p>
+<h2 id="the-cluster-dumper-options-are">The cluster dumper options are:</h2>
+<div class="codehilite"><pre>  <span class="o">--</span><span 
class="n">help</span> <span class="p">(</span><span class="o">-</span><span 
class="n">h</span><span class="p">)</span>                  <span 
class="n">Print</span> <span class="n">out</span> <span class="n">help</span>
+
+  <span class="o">--</span><span class="n">input</span> <span 
class="p">(</span><span class="o">-</span><span class="nb">i</span><span 
class="p">)</span> <span class="n">input</span>               <span 
class="n">The</span> <span class="n">directory</span> <span 
class="n">containing</span> <span class="n">Sequence</span>    
+                       <span class="n">Files</span> <span class="k">for</span> 
<span class="n">the</span> <span class="n">Clusters</span>
+
+  <span class="o">--</span><span class="n">output</span> <span 
class="p">(</span><span class="o">-</span><span class="n">o</span><span 
class="p">)</span> <span class="n">output</span>             <span 
class="n">The</span> <span class="n">output</span> <span 
class="n">file</span><span class="p">.</span>  <span class="n">If</span> <span 
class="n">not</span> <span class="n">specified</span><span class="p">,</span>  
+                       <span class="n">dumps</span> <span class="n">to</span> 
<span class="n">the</span> <span class="n">console</span><span 
class="p">.</span>
+
+  <span class="o">--</span><span class="n">outputFormat</span> <span 
class="p">(</span><span class="o">-</span><span class="n">of</span><span 
class="p">)</span> <span class="n">outputFormat</span>    <span 
class="n">The</span> <span class="n">optional</span> <span 
class="n">output</span> <span class="n">format</span> <span class="n">to</span> 
<span class="n">write</span>
+                       <span class="n">the</span> <span 
class="n">results</span> <span class="n">as</span><span class="p">.</span> 
<span class="n">Options</span><span class="p">:</span> <span 
class="n">TEXT</span><span class="p">,</span> <span class="n">CSV</span><span 
class="p">,</span> <span class="n">or</span> <span class="n">GRAPH_ML</span>
+
+  <span class="o">--</span><span class="n">substring</span> <span 
class="p">(</span><span class="o">-</span><span class="n">b</span><span 
class="p">)</span> <span class="n">substring</span>           <span 
class="n">The</span> <span class="n">number</span> <span class="n">of</span> 
<span class="n">chars</span> <span class="n">of</span> <span 
class="n">the</span>       
+                       <span class="n">asFormatString</span><span 
class="p">()</span> <span class="n">to</span> <span class="n">print</span>
+
+  <span class="o">--</span><span class="n">pointsDir</span> <span 
class="p">(</span><span class="o">-</span><span class="n">p</span><span 
class="p">)</span> <span class="n">pointsDir</span>           <span 
class="n">The</span> <span class="n">directory</span> <span 
class="n">containing</span> <span class="n">points</span>  
+                   <span class="n">sequence</span> <span 
class="n">files</span> <span class="n">mapping</span> <span 
class="n">input</span> <span class="n">vectors</span>                           
 <span class="n">to</span> <span class="n">their</span> <span 
class="n">cluster</span><span class="p">.</span>  <span class="n">If</span> 
<span class="n">specified</span><span class="p">,</span> 
+                       <span class="n">then</span> <span class="n">the</span> 
<span class="n">program</span> <span class="n">will</span> <span 
class="n">output</span> <span class="n">the</span> 
+                       <span class="n">points</span> <span 
class="n">associated</span> <span class="n">with</span> <span 
class="n">a</span> <span class="n">cluster</span>
+
+  <span class="o">--</span><span class="n">dictionary</span> <span 
class="p">(</span><span class="o">-</span><span class="n">d</span><span 
class="p">)</span> <span class="n">dictionary</span>         <span 
class="n">The</span> <span class="n">dictionary</span> <span 
class="n">file</span><span class="p">.</span>
+
+  <span class="o">--</span><span class="n">dictionaryType</span> <span 
class="p">(</span><span class="o">-</span><span class="n">dt</span><span 
class="p">)</span> <span class="n">dictionaryType</span>    <span 
class="n">The</span> <span class="n">dictionary</span> <span 
class="n">file</span> <span class="n">type</span>     
+                       <span class="p">(</span><span 
class="n">text</span><span class="o">|</span><span 
class="n">sequencefile</span><span class="p">)</span>
+
+  <span class="o">--</span><span class="n">distanceMeasure</span> <span 
class="p">(</span><span class="o">-</span><span class="n">dm</span><span 
class="p">)</span> <span class="n">distanceMeasure</span>  <span 
class="n">The</span> <span class="n">classname</span> <span class="n">of</span> 
<span class="n">the</span> <span class="n">DistanceMeasure</span><span 
class="p">.</span>
+                       <span class="n">Default</span> <span 
class="n">is</span> <span class="n">SquaredEuclidean</span><span 
class="p">.</span>
+
+  <span class="o">--</span><span class="n">numWords</span> <span 
class="p">(</span><span class="o">-</span><span class="n">n</span><span 
class="p">)</span> <span class="n">numWords</span>         <span 
class="n">The</span> <span class="n">number</span> <span class="n">of</span> 
<span class="n">top</span> <span class="n">terms</span> <span 
class="n">to</span> <span class="n">print</span>
+
+  <span class="o">--</span><span class="n">tempDir</span> <span 
class="n">tempDir</span>            <span class="n">Intermediate</span> <span 
class="n">output</span> <span class="n">directory</span>
+
+  <span class="o">--</span><span class="n">startPhase</span> <span 
class="n">startPhase</span>          <span class="n">First</span> <span 
class="n">phase</span> <span class="n">to</span> <span class="n">run</span>
+
+  <span class="o">--</span><span class="n">endPhase</span> <span 
class="n">endPhase</span>              <span class="n">Last</span> <span 
class="n">phase</span> <span class="n">to</span> <span class="n">run</span>
+
+  <span class="o">--</span><span class="n">evaluate</span> <span 
class="p">(</span><span class="o">-</span><span class="n">e</span><span 
class="p">)</span>              <span class="n">Run</span> <span 
class="n">ClusterEvaluator</span> <span class="n">and</span> <span 
class="n">CDbwEvaluator</span> <span class="n">over</span> <span 
class="n">the</span>
+                       <span class="n">input</span><span class="p">.</span> 
<span class="n">The</span> <span class="n">output</span> <span 
class="n">will</span> <span class="n">be</span> <span class="n">appended</span> 
<span class="n">to</span> <span class="n">the</span> <span 
class="n">rest</span> <span class="n">of</span>
+                       <span class="n">the</span> <span 
class="n">output</span> <span class="n">at</span> <span class="n">the</span> 
<span class="k">end</span><span class="p">.</span>
+</pre></div>
+
+
+<p>More information on using clusterdump utility can be found <a 
href="cluster-dumper.html">here</a></p>
+<p><a name="ClusteringYourData-ValidatingtheOutput"></a></p>
+<h1 id="validating-the-output">Validating the Output</h1>
+<p>{quote}
+Ted Dunning: A principled approach to cluster evaluation is to measure how 
well the
+cluster membership captures the structure of unseen data.  A natural
+measure for this is to measure how much of the entropy of the data is
+captured by cluster membership.  For k-means and its natural L_2 metric,
+the natural cluster quality metric is the squared distance from the nearest
+centroid adjusted by the log_2 of the number of clusters.  This can be
+compared to the squared magnitude of the original data or the squared
+deviation from the centroid for all of the data.  The idea is that you are
+changing the representation of the data by allocating some of the bits in
+your original representation to represent which cluster each point is in. 
+If those bits aren't made up by the residue being small then your
+clustering is making a bad trade-off.</p>
+<p>In the past, I have used other more heuristic measures as well.  One of the
+key characteristics that I would like to see out of a clustering is a
+degree of stability.  Thus, I look at the fractions of points that are
+assigned to each cluster or the distribution of distances from the cluster
+centroid. These values should be relatively stable when applied to held-out
+data.</p>
+<p>For text, you can actually compute perplexity which measures how well
+cluster membership predicts what words are used.  This is nice because you
+don't have to worry about the entropy of real valued numbers.</p>
+<p>Manual inspection and the so-called laugh test is also important.  The idea
+is that the results should not be so ludicrous as to make you laugh.
+Unfortunately, it is pretty easy to kid yourself into thinking your system
+is working using this kind of inspection.  The problem is that we are too
+good at seeing (making up) patterns.
+{quote}</p>
+   </div>
+  </div>     
+</div> 
+  <footer class="footer" align="center">
+    <div class="container">
+      <p>
+        Copyright &copy; 2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache and the Apache feather logos are trademarks of The Apache 
Software Foundation.
+      </p>
+    </div>
+  </footer>
+  
+  <script src="/js/jquery-1.9.1.min.js"></script>
+  <script src="/js/bootstrap.min.js"></script>
+  <script>
+    (function() {
+      var cx = '012254517474945470291:vhsfv7eokdc';
+      var gcse = document.createElement('script');
+      gcse.type = 'text/javascript';
+      gcse.async = true;
+      gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') 
+
+          '//www.google.com/cse/cse.js?cx=' + cx;
+      var s = document.getElementsByTagName('script')[0];
+      s.parentNode.insertBefore(gcse, s);
+    })();
+  </script>
+</body>
+</html>

Added: 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/expectation-maximization.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/expectation-maximization.html
 (added)
+++ 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/expectation-maximization.html
 Thu Mar 19 21:21:45 2015
@@ -0,0 +1,323 @@
+<!DOCTYPE html>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en"><head><meta 
http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <title>Apache Mahout: Scalable machine learning and data mining</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="Distribution" content="Global">
+  <meta name="Robots" content="index,follow">
+  <meta name="keywords" content="apache, apache hadoop, apache lucene,
+        business data mining, cluster analysis,
+        collaborative filtering, data extraction, data filtering, data 
framework, data integration,
+        data matching, data mining, data mining algorithms, data mining 
analysis, data mining data,
+        data mining introduction, data mining software,
+        data mining techniques, data representation, data set, datamining,
+        feature extraction, fuzzy k means, genetic algorithm, hadoop,
+        hierarchical clustering, high dimensional, introduction to data 
mining, kmeans,
+        knowledge discovery, learning approach, learning approaches, learning 
methods,
+        learning techniques, lucene, machine learning, machine translation, 
mahout apache,
+        mahout taste, map reduce hadoop, mining data, mining methods, naive 
bayes,
+        natural language processing,
+        supervised, text mining, time series data, unsupervised, web data 
mining">
+  <link rel="shortcut icon" type="image/x-icon" 
href="http://mahout.apache.org/images/favicon.ico";>
+  <script type="text/javascript" src="/js/prototype.js"></script>
+  <script type="text/javascript" src="/js/effects.js"></script>
+  <script type="text/javascript" src="/js/search.js"></script>
+  <script type="text/javascript" src="/js/slides.js"></script>
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-responsive.css" rel="stylesheet">
+  <link rel="stylesheet" href="/css/global.css" type="text/css">
+
+  <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown -->
+  <script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    tex2jax: {
+      skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+    }
+  });
+  MathJax.Hub.Queue(function() {
+    var all = MathJax.Hub.getAllJax(), i;
+    for(i = 0; i < all.length; i += 1) {
+      all[i].SourceElement().parentNode.className += ' has-jax';
+    }
+  });
+  </script>
+  <script type="text/javascript">
+    var mathjax = document.createElement('script'); 
+    mathjax.type = 'text/javascript'; 
+    mathjax.async = true;
+
+    mathjax.src = ('https:' == document.location.protocol) ?
+        
'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
 : 
+        
'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+       
+         var s = document.getElementsByTagName('script')[0]; 
+    s.parentNode.insertBefore(mathjax, s);
+  </script>
+</head>
+
+<body id="home" data-twttr-rendered="true">
+  <div id="wrap">
+   <div id="header">
+    <div id="logo"><a href="/overview.html"></a></div>
+  <div id="search">
+    <form id="search-form" action="http://www.google.com/search"; method="get" 
class="navbar-search pull-right">    
+      <input value="http://mahout.apache.org"; name="sitesearch" type="hidden">
+      <input class="search-query" name="q" id="query" type="text">
+      <input id="submission" type="image" src="/images/mahout-lupe.png" 
alt="Search" />
+    </form>
+  </div>
+
+    <div class="navbar navbar-inverse" 
style="position:absolute;top:133px;padding-right:0px;padding-left:0px;">
+      <div class="navbar-inner" style="border: none; background: #999; border: 
none; border-radius: 0px;">
+        <div class="container">
+          <button type="button" class="btn btn-navbar" data-toggle="collapse" 
data-target=".nav-collapse">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <!-- <a class="brand" href="#">Apache Community Development 
Project</a> -->
+          <div class="nav-collapse collapse">
+            <ul class="nav">
+              <li><a href="/">Home</a></li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">General<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/general/downloads.html">Downloads</a>
+                  <li><a href="/general/who-we-are.html">Who we are</a>
+                  <li><a 
href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a>
+                  <li><a href="/general/release-notes.html">Release Notes</a> 
+                  <li><a href="/general/books-tutorials-and-talks.html">Books, 
Tutorials, Talks</a></li>
+                  <li><a href="/general/powered-by-mahout.html">Powered By 
Mahout</a>
+                  <li><a 
href="/general/professional-support.html">Professional Support</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Resources</li>
+                  <li><a href="/general/reference-reading.html">Reference 
Reading</a>
+                  <li><a href="/general/faq.html">FAQ</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Legal</li>
+                  <li><a 
href="http://www.apache.org/licenses/";>License</a></li>
+                  <li><a 
href="http://www.apache.org/security/";>Security</a></li>
+                  <li><a href="/general/privacy-policy.html">Privacy Policy</a>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Developers<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/developers/developer-resources.html">Developer 
resources</a></li>
+                  <li><a href="/developers/version-control.html">Version 
control</a></li>
+                  <li><a href="/developers/buildingmahout.html">Build from 
source</a></li>
+                  <li><a href="/developers/issue-tracker.html">Issue 
tracker</a></li>
+                  <li><a href="https://builds.apache.org/job/Mahout-Quality/"; 
target="_blank">Code quality reports</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Contributions</li>
+                  <li><a href="/developers/how-to-contribute.html">How to 
contribute</a></li>
+                  <li><a href="/developers/how-to-become-a-committer.html">How 
to become a committer</a></li>
+                  <li><a href="/developers/gsoc.html">GSoC</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">For committers</li>
+                  <li><a href="/developers/how-to-update-the-website.html">How 
to update the website</a></li>
+                  <li><a href="/developers/patch-check-list.html">Patch check 
list</a></li>
+                  <li><a href="/developers/github.html">Handling Github 
PRs</a></li>
+                  <li><a href="/developers/how-to-release.html">How to 
release</a></li>
+                  <li><a href="/developers/thirdparty-dependencies.html">Third 
party dependencies</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Basics<b class="caret"></b></a>
+                 <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of 
algorithms</a>
+                  <li><a href="/users/basics/quickstart.html">Quickstart</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Working with text</li>
+                  <li><a 
href="/users/basics/creating-vectors-from-text.html">Creating vectors from 
text</a>
+                  <li><a 
href="/users/basics/collocations.html">Collocations</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Dimensionality reduction</li>
+                  <li><a 
href="/users/dim-reduction/dimensional-reduction.html">Singular Value 
Decomposition</a></li>
+                  <li><a href="/users/dim-reduction/ssvd.html">Stochastic 
SVD</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Topic Models</li>      
+                  <li><a 
href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet 
Allocation</a></li>
+                </ul>
+                 </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Spark<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/sparkbindings/home.html">Scala &amp; 
Spark Bindings Overview</a></li>
+                  <li><a 
href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark 
Shell</a></li>
+                             <li class="divider"></li>
+                  <li><a href="/users/sparkbindings/faq.html">FAQ</a></li>
+                </ul>
+               </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Classification<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a 
href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li>
+                  <li><a 
href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov 
Models</a></li>
+                  <li><a 
href="/users/mapreduce/classification/logistic-regression.html">Logistic 
Regression</a></li>
+                  <li><a 
href="/users/mapreduce/classification/partial-implementation.html">Random 
Forest</a></li>
+
+                  <li class="divider"></li>
+                  <li class="nav-header">Examples</li>
+                  <li><a 
href="/users/mapreduce/classification/breiman-example.html">Breiman 
example</a></li>
+                  <li><a 
href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups 
example</a></li>
+                </ul></li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Clustering<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/streaming-k-means.html">Streaming 
KMeans</a></li>
+                <li><a 
href="/users/mapreduce/clustering/spectral-clustering.html">Spectral 
Clustering</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Commandline usage</li>
+                <li><a 
href="/users/mapreduce/clustering/k-means-commandline.html">Options for 
k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-commandline.html">Options for 
Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for 
Fuzzy k-Means</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Examples</li>
+                <li><a 
href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic
 data</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Post processing</li>
+                <li><a 
href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper 
tool</a></li>
+                <li><a 
href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster 
visualisation</a></li>
+                </ul></li>
+                <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Recommendations<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First 
Timer FAQ</a></li>
+                <li><a 
href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based 
recommender <br/>in 5 minutes</a></li>
+               <li><a 
href="/users/mapreduce/recommender/matrix-factorization.html">Matrix 
factorization-based<br/> recommenders</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Hadoop</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to 
item-based recommendations<br/> with Hadoop</a></li>
+                <li><a 
href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS 
recommendations<br/> with Hadoop</a></li>
+                <li class="nav-header">Spark</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to 
cooccurrence-based<br/> recommendations with Spark</a></li>
+              </ul>
+            </li>
+           </ul>
+          </div><!--/.nav-collapse -->
+        </div>
+      </div>
+    </div>
+
+</div>
+
+ <div id="sidebar">
+  <div id="sidebar-wrap">
+    <h2>Twitter</h2>
+       <ul class="sidemenu">
+               <li>
+<a class="twitter-timeline" href="https://twitter.com/ApacheMahout"; 
data-widget-id="422861673444028416">Tweets by @ApacheMahout</a>
+<script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
+</li>
+       </ul>
+    <h2>Apache Software Foundation</h2>
+    <ul class="sidemenu">
+      <li><a href="http://www.apache.org/foundation/how-it-works.html";>How the 
ASF works</a></li>
+      <li><a href="http://www.apache.org/foundation/getinvolved.html";>Get 
Involved</a></li>
+      <li><a href="http://www.apache.org/dev/";>Developer Resources</a></li>
+      <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+      <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+    </ul>
+    <h2>Related Projects</h2>
+    <ul class="sidemenu">
+      <li><a href="http://lucene.apache.org/";>Lucene</a></li>
+      <li><a href="http://hadoop.apache.org/";>Hadoop</a></li>
+    </ul>
+  </div>
+</div>
+
+  <div id="content-wrap" class="clearfix">
+   <div id="main">
+    <p><a name="ExpectationMaximization-ExpectationMaximization"></a></p>
+<h1 id="expectation-maximization">Expectation Maximization</h1>
+<p>The principle of EM can be applied to several learning settings, but is
+most commonly associated with clustering. The main principle of the
+algorithm is comparable to k-Means. Yet in contrast to hard cluster
+assignments, each object is given some probability to belong to a cluster.
+Accordingly cluster centers are recomputed based on the average of all
+objects weighted by their probability of belonging to the cluster at hand.</p>
+<p><a name="ExpectationMaximization-Canopy-modifiedEM"></a></p>
+<h2 id="canopy-modified-em">Canopy-modified EM</h2>
+<p>One can also use the canopies idea to speed up prototypebased clustering
+methods like K-means and Expectation-Maximization (EM). In general, neither
+K-means nor EMspecify how many clusters to use. The canopies technique does
+not help this choice.</p>
+<p>Prototypes (our estimates of the cluster centroids) are associated with the
+canopies that contain them, and the prototypes are only influenced by data
+that are inside their associated canopies. After creating the canopies, we
+decide how many prototypes will be created for each canopy. This could be
+done, for example, using the number of data points in a canopy and AIC or
+BIC where points that occur in more than one canopy are counted
+fractionally. Then we place prototypesinto each canopy. This initial
+placement can be random, as long as it is within the canopy in question, as
+determined by the inexpensive distance metric.</p>
+<p>Then, instead of calculating the distance from each prototype to every
+point (as is traditional, a O(nk) operation), theE-step instead calculates
+the distance from each prototype to a much smaller number of points. For
+each prototype, we find the canopies that contain it (using the cheap
+distance metric), and only calculate distances (using the expensive
+distance metric) from that prototype to points within those canopies.</p>
+<p>Note that by this procedure prototypes may move across canopy boundaries
+when canopies overlap. Prototypes may move to cover the data in the
+overlapping region, and then move entirely into another canopy in order to
+cover data there.</p>
+<p>The canopy-modified EM algorithm behaves very similarly to traditional EM,
+with the slight difference that points outside the canopy have no influence
+on points in the canopy, rather than a minute influence. If the canopy
+property holds, and points in the same cluster fall in the same canopy,
+then the canopy-modified EM will almost always converge to the same maximum
+in likelihood as the traditional EM. In fact, the difference in each
+iterative step (apart from the enormous computational savings of computing
+fewer terms) will be negligible since points outside the canopy will have
+exponentially small influence.</p>
+<p><a name="ExpectationMaximization-StrategyforParallelization"></a></p>
+<h2 id="strategy-for-parallelization">Strategy for Parallelization</h2>
+<p><a name="ExpectationMaximization-Map/ReduceImplementation"></a></p>
+<h2 id="mapreduce-implementation">Map/Reduce Implementation</h2>
+   </div>
+  </div>     
+</div> 
+  <footer class="footer" align="center">
+    <div class="container">
+      <p>
+        Copyright &copy; 2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache and the Apache feather logos are trademarks of The Apache 
Software Foundation.
+      </p>
+    </div>
+  </footer>
+  
+  <script src="/js/jquery-1.9.1.min.js"></script>
+  <script src="/js/bootstrap.min.js"></script>
+  <script>
+    (function() {
+      var cx = '012254517474945470291:vhsfv7eokdc';
+      var gcse = document.createElement('script');
+      gcse.type = 'text/javascript';
+      gcse.async = true;
+      gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') 
+
+          '//www.google.com/cse/cse.js?cx=' + cx;
+      var s = document.getElementsByTagName('script')[0];
+      s.parentNode.insertBefore(gcse, s);
+    })();
+  </script>
+</body>
+</html>


Reply via email to