Added: 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/visualizing-sample-clusters.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/visualizing-sample-clusters.html
 (added)
+++ 
websites/staging/mahout/trunk/content/users/mapreduce/clustering/visualizing-sample-clusters.html
 Thu Mar 19 21:21:45 2015
@@ -0,0 +1,315 @@
+<!DOCTYPE html>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en"><head><meta 
http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <title>Apache Mahout: Scalable machine learning and data mining</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="Distribution" content="Global">
+  <meta name="Robots" content="index,follow">
+  <meta name="keywords" content="apache, apache hadoop, apache lucene,
+        business data mining, cluster analysis,
+        collaborative filtering, data extraction, data filtering, data 
framework, data integration,
+        data matching, data mining, data mining algorithms, data mining 
analysis, data mining data,
+        data mining introduction, data mining software,
+        data mining techniques, data representation, data set, datamining,
+        feature extraction, fuzzy k means, genetic algorithm, hadoop,
+        hierarchical clustering, high dimensional, introduction to data 
mining, kmeans,
+        knowledge discovery, learning approach, learning approaches, learning 
methods,
+        learning techniques, lucene, machine learning, machine translation, 
mahout apache,
+        mahout taste, map reduce hadoop, mining data, mining methods, naive 
bayes,
+        natural language processing,
+        supervised, text mining, time series data, unsupervised, web data 
mining">
+  <link rel="shortcut icon" type="image/x-icon" 
href="http://mahout.apache.org/images/favicon.ico";>
+  <script type="text/javascript" src="/js/prototype.js"></script>
+  <script type="text/javascript" src="/js/effects.js"></script>
+  <script type="text/javascript" src="/js/search.js"></script>
+  <script type="text/javascript" src="/js/slides.js"></script>
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-responsive.css" rel="stylesheet">
+  <link rel="stylesheet" href="/css/global.css" type="text/css">
+
+  <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown -->
+  <script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    tex2jax: {
+      skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+    }
+  });
+  MathJax.Hub.Queue(function() {
+    var all = MathJax.Hub.getAllJax(), i;
+    for(i = 0; i < all.length; i += 1) {
+      all[i].SourceElement().parentNode.className += ' has-jax';
+    }
+  });
+  </script>
+  <script type="text/javascript">
+    var mathjax = document.createElement('script'); 
+    mathjax.type = 'text/javascript'; 
+    mathjax.async = true;
+
+    mathjax.src = ('https:' == document.location.protocol) ?
+        
'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
 : 
+        
'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+       
+         var s = document.getElementsByTagName('script')[0]; 
+    s.parentNode.insertBefore(mathjax, s);
+  </script>
+</head>
+
+<body id="home" data-twttr-rendered="true">
+  <div id="wrap">
+   <div id="header">
+    <div id="logo"><a href="/overview.html"></a></div>
+  <div id="search">
+    <form id="search-form" action="http://www.google.com/search"; method="get" 
class="navbar-search pull-right">    
+      <input value="http://mahout.apache.org"; name="sitesearch" type="hidden">
+      <input class="search-query" name="q" id="query" type="text">
+      <input id="submission" type="image" src="/images/mahout-lupe.png" 
alt="Search" />
+    </form>
+  </div>
+
+    <div class="navbar navbar-inverse" 
style="position:absolute;top:133px;padding-right:0px;padding-left:0px;">
+      <div class="navbar-inner" style="border: none; background: #999; border: 
none; border-radius: 0px;">
+        <div class="container">
+          <button type="button" class="btn btn-navbar" data-toggle="collapse" 
data-target=".nav-collapse">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <!-- <a class="brand" href="#">Apache Community Development 
Project</a> -->
+          <div class="nav-collapse collapse">
+            <ul class="nav">
+              <li><a href="/">Home</a></li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">General<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/general/downloads.html">Downloads</a>
+                  <li><a href="/general/who-we-are.html">Who we are</a>
+                  <li><a 
href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a>
+                  <li><a href="/general/release-notes.html">Release Notes</a> 
+                  <li><a href="/general/books-tutorials-and-talks.html">Books, 
Tutorials, Talks</a></li>
+                  <li><a href="/general/powered-by-mahout.html">Powered By 
Mahout</a>
+                  <li><a 
href="/general/professional-support.html">Professional Support</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Resources</li>
+                  <li><a href="/general/reference-reading.html">Reference 
Reading</a>
+                  <li><a href="/general/faq.html">FAQ</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Legal</li>
+                  <li><a 
href="http://www.apache.org/licenses/";>License</a></li>
+                  <li><a 
href="http://www.apache.org/security/";>Security</a></li>
+                  <li><a href="/general/privacy-policy.html">Privacy Policy</a>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Developers<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/developers/developer-resources.html">Developer 
resources</a></li>
+                  <li><a href="/developers/version-control.html">Version 
control</a></li>
+                  <li><a href="/developers/buildingmahout.html">Build from 
source</a></li>
+                  <li><a href="/developers/issue-tracker.html">Issue 
tracker</a></li>
+                  <li><a href="https://builds.apache.org/job/Mahout-Quality/"; 
target="_blank">Code quality reports</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Contributions</li>
+                  <li><a href="/developers/how-to-contribute.html">How to 
contribute</a></li>
+                  <li><a href="/developers/how-to-become-a-committer.html">How 
to become a committer</a></li>
+                  <li><a href="/developers/gsoc.html">GSoC</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">For committers</li>
+                  <li><a href="/developers/how-to-update-the-website.html">How 
to update the website</a></li>
+                  <li><a href="/developers/patch-check-list.html">Patch check 
list</a></li>
+                  <li><a href="/developers/github.html">Handling Github 
PRs</a></li>
+                  <li><a href="/developers/how-to-release.html">How to 
release</a></li>
+                  <li><a href="/developers/thirdparty-dependencies.html">Third 
party dependencies</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Basics<b class="caret"></b></a>
+                 <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of 
algorithms</a>
+                  <li><a href="/users/basics/quickstart.html">Quickstart</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Working with text</li>
+                  <li><a 
href="/users/basics/creating-vectors-from-text.html">Creating vectors from 
text</a>
+                  <li><a 
href="/users/basics/collocations.html">Collocations</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Dimensionality reduction</li>
+                  <li><a 
href="/users/dim-reduction/dimensional-reduction.html">Singular Value 
Decomposition</a></li>
+                  <li><a href="/users/dim-reduction/ssvd.html">Stochastic 
SVD</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Topic Models</li>      
+                  <li><a 
href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet 
Allocation</a></li>
+                </ul>
+                 </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Spark<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/sparkbindings/home.html">Scala &amp; 
Spark Bindings Overview</a></li>
+                  <li><a 
href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark 
Shell</a></li>
+                             <li class="divider"></li>
+                  <li><a href="/users/sparkbindings/faq.html">FAQ</a></li>
+                </ul>
+               </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Classification<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a 
href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li>
+                  <li><a 
href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov 
Models</a></li>
+                  <li><a 
href="/users/mapreduce/classification/logistic-regression.html">Logistic 
Regression</a></li>
+                  <li><a 
href="/users/mapreduce/classification/partial-implementation.html">Random 
Forest</a></li>
+
+                  <li class="divider"></li>
+                  <li class="nav-header">Examples</li>
+                  <li><a 
href="/users/mapreduce/classification/breiman-example.html">Breiman 
example</a></li>
+                  <li><a 
href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups 
example</a></li>
+                </ul></li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Clustering<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/streaming-k-means.html">Streaming 
KMeans</a></li>
+                <li><a 
href="/users/mapreduce/clustering/spectral-clustering.html">Spectral 
Clustering</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Commandline usage</li>
+                <li><a 
href="/users/mapreduce/clustering/k-means-commandline.html">Options for 
k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-commandline.html">Options for 
Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for 
Fuzzy k-Means</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Examples</li>
+                <li><a 
href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic
 data</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Post processing</li>
+                <li><a 
href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper 
tool</a></li>
+                <li><a 
href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster 
visualisation</a></li>
+                </ul></li>
+                <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Recommendations<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First 
Timer FAQ</a></li>
+                <li><a 
href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based 
recommender <br/>in 5 minutes</a></li>
+               <li><a 
href="/users/mapreduce/recommender/matrix-factorization.html">Matrix 
factorization-based<br/> recommenders</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Hadoop</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to 
item-based recommendations<br/> with Hadoop</a></li>
+                <li><a 
href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS 
recommendations<br/> with Hadoop</a></li>
+                <li class="nav-header">Spark</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to 
cooccurrence-based<br/> recommendations with Spark</a></li>
+              </ul>
+            </li>
+           </ul>
+          </div><!--/.nav-collapse -->
+        </div>
+      </div>
+    </div>
+
+</div>
+
+ <div id="sidebar">
+  <div id="sidebar-wrap">
+    <h2>Twitter</h2>
+       <ul class="sidemenu">
+               <li>
+<a class="twitter-timeline" href="https://twitter.com/ApacheMahout"; 
data-widget-id="422861673444028416">Tweets by @ApacheMahout</a>
+<script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
+</li>
+       </ul>
+    <h2>Apache Software Foundation</h2>
+    <ul class="sidemenu">
+      <li><a href="http://www.apache.org/foundation/how-it-works.html";>How the 
ASF works</a></li>
+      <li><a href="http://www.apache.org/foundation/getinvolved.html";>Get 
Involved</a></li>
+      <li><a href="http://www.apache.org/dev/";>Developer Resources</a></li>
+      <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+      <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+    </ul>
+    <h2>Related Projects</h2>
+    <ul class="sidemenu">
+      <li><a href="http://lucene.apache.org/";>Lucene</a></li>
+      <li><a href="http://hadoop.apache.org/";>Hadoop</a></li>
+    </ul>
+  </div>
+</div>
+
+  <div id="content-wrap" class="clearfix">
+   <div id="main">
+    <p><a name="VisualizingSampleClusters-Introduction"></a></p>
+<h1 id="introduction">Introduction</h1>
+<p>Mahout provides examples to visualize sample clusters that gets created by
+our clustering algorithms. Note that the visualization is done by Swing 
programs. You have to be in a window system on the same
+machine you run these, or logged in via a remote desktop.</p>
+<p>For visualizing the clusters, you have to execute the Java
+classes under <em>org.apache.mahout.clustering.display</em> package in
+mahout-examples module. The easiest way to achieve this is to <a 
href="users/basics/quickstart.html">setup Mahout</a> in your IDE.</p>
+<p><a name="VisualizingSampleClusters-Visualizingclusters"></a></p>
+<h1 id="visualizing-clusters">Visualizing clusters</h1>
+<p>The following classes in <em>org.apache.mahout.clustering.display</em> can 
be run
+without parameters to generate a sample data set and run the reference
+clustering implementations over them:</p>
+<ol>
+<li><strong>DisplayClustering</strong> - generates 1000 samples from three, 
symmetric
+distributions. This is the same data set that is used by the following
+clustering programs. It displays the points on a screen and superimposes
+the model parameters that were used to generate the points. You can edit
+the <em>generateSamples()</em> method to change the sample points used by these
+programs.</li>
+<li><strong>DisplayClustering</strong> - displays initial areas of generated 
points</li>
+<li><strong>DisplayCanopy</strong> - uses Canopy clustering</li>
+<li><strong>DisplayKMeans</strong> - uses k-Means clustering</li>
+<li><strong>DisplayFuzzyKMeans</strong> - uses Fuzzy k-Means clustering</li>
+<li><strong>DisplaySpectralKMeans</strong> - uses Spectral KMeans via 
map-reduce algorithm</li>
+</ol>
+<p>If you are using Eclipse, just right-click on each of the classes mentioned 
above and choose "Run As -Java Application". To run these directly from the 
command line:</p>
+<div class="codehilite"><pre><span class="n">cd</span> $<span 
class="n">MAHOUT_HOME</span><span class="o">/</span><span 
class="n">examples</span>
+<span class="n">mvn</span> <span class="o">-</span><span class="n">q</span> 
<span class="n">exec</span><span class="p">:</span><span class="n">java</span> 
<span class="o">-</span><span class="n">Dexec</span><span 
class="p">.</span><span class="n">mainClass</span><span class="p">=</span><span 
class="n">org</span><span class="p">.</span><span class="n">apache</span><span 
class="p">.</span><span class="n">mahout</span><span class="p">.</span><span 
class="n">clustering</span><span class="p">.</span><span 
class="n">display</span><span class="p">.</span><span 
class="n">DisplayClustering</span>
+</pre></div>
+
+
+<p>You can substitute other names above for <em>DisplayClustering</em>. </p>
+<p>Note that some of these programs display the sample points and then 
superimpose all of the clusters from each iteration. The last iteration's 
clusters are in
+bold red and the previous several are colored (orange, yellow, green, blue,
+magenta) in order after which all earlier clusters are in light grey. This
+helps to visualize how the clusters converge upon a solution over multiple
+iterations.</p>
+   </div>
+  </div>     
+</div> 
+  <footer class="footer" align="center">
+    <div class="container">
+      <p>
+        Copyright &copy; 2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache and the Apache feather logos are trademarks of The Apache 
Software Foundation.
+      </p>
+    </div>
+  </footer>
+  
+  <script src="/js/jquery-1.9.1.min.js"></script>
+  <script src="/js/bootstrap.min.js"></script>
+  <script>
+    (function() {
+      var cx = '012254517474945470291:vhsfv7eokdc';
+      var gcse = document.createElement('script');
+      gcse.type = 'text/javascript';
+      gcse.async = true;
+      gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') 
+
+          '//www.google.com/cse/cse.js?cx=' + cx;
+      var s = document.getElementsByTagName('script')[0];
+      s.parentNode.insertBefore(gcse, s);
+    })();
+  </script>
+</body>
+</html>

Added: 
websites/staging/mahout/trunk/content/users/mapreduce/recommender/intro-als-hadoop.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/mapreduce/recommender/intro-als-hadoop.html
 (added)
+++ 
websites/staging/mahout/trunk/content/users/mapreduce/recommender/intro-als-hadoop.html
 Thu Mar 19 21:21:45 2015
@@ -0,0 +1,350 @@
+<!DOCTYPE html>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en"><head><meta 
http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <title>Apache Mahout: Scalable machine learning and data mining</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="Distribution" content="Global">
+  <meta name="Robots" content="index,follow">
+  <meta name="keywords" content="apache, apache hadoop, apache lucene,
+        business data mining, cluster analysis,
+        collaborative filtering, data extraction, data filtering, data 
framework, data integration,
+        data matching, data mining, data mining algorithms, data mining 
analysis, data mining data,
+        data mining introduction, data mining software,
+        data mining techniques, data representation, data set, datamining,
+        feature extraction, fuzzy k means, genetic algorithm, hadoop,
+        hierarchical clustering, high dimensional, introduction to data 
mining, kmeans,
+        knowledge discovery, learning approach, learning approaches, learning 
methods,
+        learning techniques, lucene, machine learning, machine translation, 
mahout apache,
+        mahout taste, map reduce hadoop, mining data, mining methods, naive 
bayes,
+        natural language processing,
+        supervised, text mining, time series data, unsupervised, web data 
mining">
+  <link rel="shortcut icon" type="image/x-icon" 
href="http://mahout.apache.org/images/favicon.ico";>
+  <script type="text/javascript" src="/js/prototype.js"></script>
+  <script type="text/javascript" src="/js/effects.js"></script>
+  <script type="text/javascript" src="/js/search.js"></script>
+  <script type="text/javascript" src="/js/slides.js"></script>
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-responsive.css" rel="stylesheet">
+  <link rel="stylesheet" href="/css/global.css" type="text/css">
+
+  <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown -->
+  <script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    tex2jax: {
+      skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+    }
+  });
+  MathJax.Hub.Queue(function() {
+    var all = MathJax.Hub.getAllJax(), i;
+    for(i = 0; i < all.length; i += 1) {
+      all[i].SourceElement().parentNode.className += ' has-jax';
+    }
+  });
+  </script>
+  <script type="text/javascript">
+    var mathjax = document.createElement('script'); 
+    mathjax.type = 'text/javascript'; 
+    mathjax.async = true;
+
+    mathjax.src = ('https:' == document.location.protocol) ?
+        
'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
 : 
+        
'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+       
+         var s = document.getElementsByTagName('script')[0]; 
+    s.parentNode.insertBefore(mathjax, s);
+  </script>
+</head>
+
+<body id="home" data-twttr-rendered="true">
+  <div id="wrap">
+   <div id="header">
+    <div id="logo"><a href="/overview.html"></a></div>
+  <div id="search">
+    <form id="search-form" action="http://www.google.com/search"; method="get" 
class="navbar-search pull-right">    
+      <input value="http://mahout.apache.org"; name="sitesearch" type="hidden">
+      <input class="search-query" name="q" id="query" type="text">
+      <input id="submission" type="image" src="/images/mahout-lupe.png" 
alt="Search" />
+    </form>
+  </div>
+
+    <div class="navbar navbar-inverse" 
style="position:absolute;top:133px;padding-right:0px;padding-left:0px;">
+      <div class="navbar-inner" style="border: none; background: #999; border: 
none; border-radius: 0px;">
+        <div class="container">
+          <button type="button" class="btn btn-navbar" data-toggle="collapse" 
data-target=".nav-collapse">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <!-- <a class="brand" href="#">Apache Community Development 
Project</a> -->
+          <div class="nav-collapse collapse">
+            <ul class="nav">
+              <li><a href="/">Home</a></li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">General<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/general/downloads.html">Downloads</a>
+                  <li><a href="/general/who-we-are.html">Who we are</a>
+                  <li><a 
href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a>
+                  <li><a href="/general/release-notes.html">Release Notes</a> 
+                  <li><a href="/general/books-tutorials-and-talks.html">Books, 
Tutorials, Talks</a></li>
+                  <li><a href="/general/powered-by-mahout.html">Powered By 
Mahout</a>
+                  <li><a 
href="/general/professional-support.html">Professional Support</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Resources</li>
+                  <li><a href="/general/reference-reading.html">Reference 
Reading</a>
+                  <li><a href="/general/faq.html">FAQ</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Legal</li>
+                  <li><a 
href="http://www.apache.org/licenses/";>License</a></li>
+                  <li><a 
href="http://www.apache.org/security/";>Security</a></li>
+                  <li><a href="/general/privacy-policy.html">Privacy Policy</a>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Developers<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/developers/developer-resources.html">Developer 
resources</a></li>
+                  <li><a href="/developers/version-control.html">Version 
control</a></li>
+                  <li><a href="/developers/buildingmahout.html">Build from 
source</a></li>
+                  <li><a href="/developers/issue-tracker.html">Issue 
tracker</a></li>
+                  <li><a href="https://builds.apache.org/job/Mahout-Quality/"; 
target="_blank">Code quality reports</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Contributions</li>
+                  <li><a href="/developers/how-to-contribute.html">How to 
contribute</a></li>
+                  <li><a href="/developers/how-to-become-a-committer.html">How 
to become a committer</a></li>
+                  <li><a href="/developers/gsoc.html">GSoC</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">For committers</li>
+                  <li><a href="/developers/how-to-update-the-website.html">How 
to update the website</a></li>
+                  <li><a href="/developers/patch-check-list.html">Patch check 
list</a></li>
+                  <li><a href="/developers/github.html">Handling Github 
PRs</a></li>
+                  <li><a href="/developers/how-to-release.html">How to 
release</a></li>
+                  <li><a href="/developers/thirdparty-dependencies.html">Third 
party dependencies</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Basics<b class="caret"></b></a>
+                 <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of 
algorithms</a>
+                  <li><a href="/users/basics/quickstart.html">Quickstart</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Working with text</li>
+                  <li><a 
href="/users/basics/creating-vectors-from-text.html">Creating vectors from 
text</a>
+                  <li><a 
href="/users/basics/collocations.html">Collocations</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Dimensionality reduction</li>
+                  <li><a 
href="/users/dim-reduction/dimensional-reduction.html">Singular Value 
Decomposition</a></li>
+                  <li><a href="/users/dim-reduction/ssvd.html">Stochastic 
SVD</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Topic Models</li>      
+                  <li><a 
href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet 
Allocation</a></li>
+                </ul>
+                 </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Spark<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/sparkbindings/home.html">Scala &amp; 
Spark Bindings Overview</a></li>
+                  <li><a 
href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark 
Shell</a></li>
+                             <li class="divider"></li>
+                  <li><a href="/users/sparkbindings/faq.html">FAQ</a></li>
+                </ul>
+               </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Classification<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a 
href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li>
+                  <li><a 
href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov 
Models</a></li>
+                  <li><a 
href="/users/mapreduce/classification/logistic-regression.html">Logistic 
Regression</a></li>
+                  <li><a 
href="/users/mapreduce/classification/partial-implementation.html">Random 
Forest</a></li>
+
+                  <li class="divider"></li>
+                  <li class="nav-header">Examples</li>
+                  <li><a 
href="/users/mapreduce/classification/breiman-example.html">Breiman 
example</a></li>
+                  <li><a 
href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups 
example</a></li>
+                </ul></li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Clustering<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/streaming-k-means.html">Streaming 
KMeans</a></li>
+                <li><a 
href="/users/mapreduce/clustering/spectral-clustering.html">Spectral 
Clustering</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Commandline usage</li>
+                <li><a 
href="/users/mapreduce/clustering/k-means-commandline.html">Options for 
k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-commandline.html">Options for 
Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for 
Fuzzy k-Means</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Examples</li>
+                <li><a 
href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic
 data</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Post processing</li>
+                <li><a 
href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper 
tool</a></li>
+                <li><a 
href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster 
visualisation</a></li>
+                </ul></li>
+                <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Recommendations<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a 
href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First 
Timer FAQ</a></li>
+                <li><a 
href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based 
recommender <br/>in 5 minutes</a></li>
+               <li><a 
href="/users/mapreduce/recommender/matrix-factorization.html">Matrix 
factorization-based<br/> recommenders</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Hadoop</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to 
item-based recommendations<br/> with Hadoop</a></li>
+                <li><a 
href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS 
recommendations<br/> with Hadoop</a></li>
+                <li class="nav-header">Spark</li>
+                <li><a 
href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to 
cooccurrence-based<br/> recommendations with Spark</a></li>
+              </ul>
+            </li>
+           </ul>
+          </div><!--/.nav-collapse -->
+        </div>
+      </div>
+    </div>
+
+</div>
+
+ <div id="sidebar">
+  <div id="sidebar-wrap">
+    <h2>Twitter</h2>
+       <ul class="sidemenu">
+               <li>
+<a class="twitter-timeline" href="https://twitter.com/ApacheMahout"; 
data-widget-id="422861673444028416">Tweets by @ApacheMahout</a>
+<script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
+</li>
+       </ul>
+    <h2>Apache Software Foundation</h2>
+    <ul class="sidemenu">
+      <li><a href="http://www.apache.org/foundation/how-it-works.html";>How the 
ASF works</a></li>
+      <li><a href="http://www.apache.org/foundation/getinvolved.html";>Get 
Involved</a></li>
+      <li><a href="http://www.apache.org/dev/";>Developer Resources</a></li>
+      <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+      <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+    </ul>
+    <h2>Related Projects</h2>
+    <ul class="sidemenu">
+      <li><a href="http://lucene.apache.org/";>Lucene</a></li>
+      <li><a href="http://hadoop.apache.org/";>Hadoop</a></li>
+    </ul>
+  </div>
+</div>
+
+  <div id="content-wrap" class="clearfix">
+   <div id="main">
+    <h1 id="introduction-to-als-recommendations-with-hadoop">Introduction to 
ALS Recommendations with Hadoop</h1>
+<h2 id="overview">Overview</h2>
+<p>Mahout’s ALS recommender is a matrix factorization algorithm that uses 
Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It 
factors the user to item matrix <em>A</em> into the user-to-feature matrix 
<em>U</em> and the item-to-feature matrix <em>M</em>: It runs the ALS algorithm 
in a parallel fashion. The algorithm details can be referred to in the 
following papers: </p>
+<ul>
+<li><a 
href="http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08%28submitted%29.pdf";>Large-scale
 Parallel Collaborative Filtering for
+the Netflix Prize</a></li>
+<li><a href="http://research.yahoo.com/pub/2433";>Collaborative Filtering for 
Implicit Feedback Datasets</a> </li>
+</ul>
+<p>This recommendation algorithm can be used in eCommerce platform to 
recommend products to customers. Unlike the user or item based recommenders 
that computes the similarity of users or items to make recommendations, the ALS 
algorithm uncovers the latent factors that explain the observed user to item 
ratings and tries to find optimal factor weights to minimize the least squares 
between predicted and actual ratings.</p>
+<p>Mahout's ALS recommendation algorithm takes as input user preferences by 
item and generates an output of recommending items for a user. The input 
customer preference could either be explicit user ratings or implicit feedback 
such as user's click on a web page.</p>
+<p>One of the strengths of the ALS based recommender, compared to the user or 
item based recommender, is its ability to handle large sparse data sets and its 
better prediction performance. It could also gives an intuitive rationale of 
the factors that influence recommendations.</p>
+<h2 id="implementation">Implementation</h2>
+<p>At present Mahout has a map-reduce implementation of ALS, which is composed 
of 2 jobs: a parallel matrix factorization job and a recommendation job.
+The matrix factorization job computes the user-to-feature matrix and 
item-to-feature matrix given the user to item ratings. Its input includes: 
+<pre>
+    --input: directory containing files of explicit user to item rating or 
implicit feedback;
+    --output: output path of the user-feature matrix and feature-item matrix;
+    --lambda: regularization parameter to avoid overfitting;
+    --alpha: confidence parameter only used on implicit feedback
+    --implicitFeedback: boolean flag to indicate whether the input dataset 
contains implicit feedback;
+    --numFeatures: dimensions of feature space;
+    --numThreadsPerSolver: number of threads per solver mapper for concurrent 
execution;
+    --numIterations: number of iterations
+    --usesLongIDs: boolean flag to indicate whether the input contains long 
IDs that need to be translated
+</pre>
+and it outputs the matrices in sequence file format. </p>
+<p>The recommendation job uses the user feature matrix and item feature matrix 
calculated from the factorization job to compute the top-N recommendations per 
user. Its input includes:
+<pre>
+    --input: directory containing files of user ids;
+    --output: output path of the recommended items for each input user id;
+    --userFeatures: path to the user feature matrix;
+    --itemFeatures: path to the item feature matrix;
+    --numRecommendations: maximum number of recommendations per user, default 
is 10;
+    --maxRating: maximum rating available;
+    --numThreads: number of threads per mapper;
+    --usesLongIDs: boolean flag to indicate whether the input contains long 
IDs that need to be translated;
+    --userIDIndex: index for user long IDs (necessary if usesLongIDs is true);
+    --itemIDIndex: index for item long IDs (necessary if usesLongIDs is true) 
+</pre>
+and it outputs a list of recommended item ids for each user. The predicted 
rating between user and item is a dot product of the user's feature vector and 
the item's feature vector.  </p>
+<h2 id="example">Example</h2>
+<p>Let’s look at a simple example of how we could use Mahout’s ALS 
recommender to recommend items for users. First, you’ll need to get Mahout up 
and running, the instructions for which can be found <a 
href="https://mahout.apache.org/users/basics/quickstart.html";>here</a>. After 
you've ensured Mahout is properly installed, we’re ready to run the 
example.</p>
+<p><strong>Step 1: Prepare test data</strong></p>
+<p>Similar to Mahout's item based recommender, the ALS recommender relies on 
the user to item preference data: <em>userID</em>, <em>itemID</em> and 
<em>preference</em>. The preference could be explicit numeric rating or counts 
of actions such as a click (implicit feedback). The test data file is organized 
as each line is a tab-delimited string, the 1st field is user id, which must be 
numeric, the 2nd field is item id, which must be numeric and the 3rd field is 
preference, which should also be a number.</p>
+<p><strong>Note:</strong> You must create IDs that are ordinal positive 
integers for all user and item IDs. Often this will require you to keep a 
dictionary
+to map into and out of Mahout IDs. For instance if the first user has ID "xyz" 
in your application, this would get an Mahout ID of the integer 1 and so on. 
The same
+for item IDs. Then after recommendations are calculated you will have to 
translate the Mahout user and item IDs back into your application IDs.</p>
+<p>To quickly start, you could specify a text file like following as the input:
+<pre>
+1   100 1
+1   200 5
+1   400 1
+2   200 2
+2   300 1
+</pre></p>
+<p><strong>Step 2: Determine parameters</strong></p>
+<p>In addition, users need to determine dimension of feature space, the number 
of iterations to run the alternating least square algorithm, Using 10 features 
and 15 iterations is a reasonable default to try first. Optionally a confidence 
parameter can be set if the input preference is implicit user feedback.  </p>
+<p><strong>Step 3: Run ALS</strong></p>
+<p>Assuming your <em>JAVA_HOME</em> is appropriately set and Mahout was 
installed properly we’re ready to configure our syntax. Enter the following 
command:</p>
+<div class="codehilite"><pre>$ <span class="n">mahout</span> <span 
class="n">parallelALS</span> <span class="o">--</span><span 
class="n">input</span> $<span class="n">als_input</span> <span 
class="o">--</span><span class="n">output</span> $<span 
class="n">als_output</span> <span class="o">--</span><span 
class="n">lambda</span> 0<span class="p">.</span>1 <span 
class="o">--</span><span class="n">implicitFeedback</span> <span 
class="n">true</span> <span class="o">--</span><span class="n">alpha</span> 
0<span class="p">.</span>8 <span class="o">--</span><span 
class="n">numFeatures</span> 2 <span class="o">--</span><span 
class="n">numIterations</span> 5  <span class="o">--</span><span 
class="n">numThreadsPerSolver</span> 1 <span class="o">--</span><span 
class="n">tempDir</span> <span class="n">tmp</span>
+</pre></div>
+
+
+<p>Running the command will execute a series of jobs the final product of 
which will be an output file deposited to the output directory specified in the 
command syntax. The output directory contains 3 sub-directories: <em>M</em> 
stores the item to feature matrix, <em>U</em> stores the user to feature matrix 
and userRatings stores the user's ratings on the items. The <em>tempDir</em> 
parameter specifies the directory to store the intermediate output of the job, 
such as the matrix output in each iteration and each item's average rating. 
Using the <em>tempDir</em> will help on debugging.</p>
+<p><strong>Step 4: Make Recommendations</strong></p>
+<p>Based on the output feature matrices from step 3, we could make 
recommendations for users. Enter the following command:</p>
+<div class="codehilite"><pre> $ <span class="n">mahout</span> <span 
class="n">recommendfactorized</span> <span class="o">--</span><span 
class="n">input</span> $<span class="n">als_input</span> <span 
class="o">--</span><span class="n">userFeatures</span> $<span 
class="n">als_output</span><span class="o">/</span><span 
class="n">U</span><span class="o">/</span> <span class="o">--</span><span 
class="n">itemFeatures</span> $<span class="n">als_output</span><span 
class="o">/</span><span class="n">M</span><span class="o">/</span> <span 
class="o">--</span><span class="n">numRecommendations</span> 1 <span 
class="o">--</span><span class="n">output</span> <span 
class="n">recommendations</span> <span class="o">--</span><span 
class="n">maxRating</span> 1
+</pre></div>
+
+
+<p>The input user file is a sequence file, the sequence record key is user id 
and value is the user's rated item ids which will be removed from 
recommendation. The output file generated in our simple example will be a text 
file giving the recommended item ids for each user. 
+Remember to translate the Mahout ids back into your application specific ids. 
</p>
+<p>There exist a variety of parameters for Mahout’s ALS recommender to 
accommodate custom business requirements; exploring and testing various 
configurations to suit your needs will doubtless lead to additional questions. 
Feel free to ask such questions on the <a 
href="https://mahout.apache.org/general/mailing-lists,-irc-and-archives.html";>mailing
 list</a>.</p>
+   </div>
+  </div>     
+</div> 
+  <footer class="footer" align="center">
+    <div class="container">
+      <p>
+        Copyright &copy; 2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache and the Apache feather logos are trademarks of The Apache 
Software Foundation.
+      </p>
+    </div>
+  </footer>
+  
+  <script src="/js/jquery-1.9.1.min.js"></script>
+  <script src="/js/bootstrap.min.js"></script>
+  <script>
+    (function() {
+      var cx = '012254517474945470291:vhsfv7eokdc';
+      var gcse = document.createElement('script');
+      gcse.type = 'text/javascript';
+      gcse.async = true;
+      gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') 
+
+          '//www.google.com/cse/cse.js?cx=' + cx;
+      var s = document.getElementsByTagName('script')[0];
+      s.parentNode.insertBefore(gcse, s);
+    })();
+  </script>
+</body>
+</html>


Reply via email to