Author: buildbot
Date: Fri Apr  3 22:41:53 2015
New Revision: 946250

Log:
Staging update by buildbot for mahout

Added:
    websites/staging/mahout/trunk/content/users/environment/
    websites/staging/mahout/trunk/content/users/environment/h2o-internals.html
    websites/staging/mahout/trunk/content/users/environment/spark-internals.html
Modified:
    websites/staging/mahout/trunk/content/   (props changed)

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Apr  3 22:41:53 2015
@@ -1 +1 @@
-1671195
+1671197

Added: 
websites/staging/mahout/trunk/content/users/environment/h2o-internals.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/environment/h2o-internals.html 
(added)
+++ websites/staging/mahout/trunk/content/users/environment/h2o-internals.html 
Fri Apr  3 22:41:53 2015
@@ -0,0 +1,307 @@
+<!DOCTYPE html>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en"><head><meta 
http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <title>Apache Mahout: Scalable machine learning and data mining</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="Distribution" content="Global">
+  <meta name="Robots" content="index,follow">
+  <meta name="keywords" content="apache, apache hadoop, apache lucene,
+        business data mining, cluster analysis,
+        collaborative filtering, data extraction, data filtering, data 
framework, data integration,
+        data matching, data mining, data mining algorithms, data mining 
analysis, data mining data,
+        data mining introduction, data mining software,
+        data mining techniques, data representation, data set, datamining,
+        feature extraction, fuzzy k means, genetic algorithm, hadoop,
+        hierarchical clustering, high dimensional, introduction to data 
mining, kmeans,
+        knowledge discovery, learning approach, learning approaches, learning 
methods,
+        learning techniques, lucene, machine learning, machine translation, 
mahout apache,
+        mahout taste, map reduce hadoop, mining data, mining methods, naive 
bayes,
+        natural language processing,
+        supervised, text mining, time series data, unsupervised, web data 
mining">
+  <link rel="shortcut icon" type="image/x-icon" 
href="http://mahout.apache.org/images/favicon.ico";>
+  <script type="text/javascript" src="/js/prototype.js"></script>
+  <script type="text/javascript" src="/js/effects.js"></script>
+  <script type="text/javascript" src="/js/search.js"></script>
+  <script type="text/javascript" src="/js/slides.js"></script>
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-responsive.css" rel="stylesheet">
+  <link rel="stylesheet" href="/css/global.css" type="text/css">
+
+  <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown -->
+  <script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    tex2jax: {
+      skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+    }
+  });
+  MathJax.Hub.Queue(function() {
+    var all = MathJax.Hub.getAllJax(), i;
+    for(i = 0; i < all.length; i += 1) {
+      all[i].SourceElement().parentNode.className += ' has-jax';
+    }
+  });
+  </script>
+  <script type="text/javascript">
+    var mathjax = document.createElement('script'); 
+    mathjax.type = 'text/javascript'; 
+    mathjax.async = true;
+
+    mathjax.src = ('https:' == document.location.protocol) ?
+        
'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
 : 
+        
'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+       
+         var s = document.getElementsByTagName('script')[0]; 
+    s.parentNode.insertBefore(mathjax, s);
+  </script>
+</head>
+
+<body id="home" data-twttr-rendered="true">
+  <div id="wrap">
+   <div id="header">
+    <div id="logo"><a href="/overview.html"></a></div>
+  <div id="search">
+    <form id="search-form" action="http://www.google.com/search"; method="get" 
class="navbar-search pull-right">    
+      <input value="http://mahout.apache.org"; name="sitesearch" type="hidden">
+      <input class="search-query" name="q" id="query" type="text">
+      <input id="submission" type="image" src="/images/mahout-lupe.png" 
alt="Search" />
+    </form>
+  </div>
+
+    <div class="navbar navbar-inverse" 
style="position:absolute;top:133px;padding-right:0px;padding-left:0px;">
+      <div class="navbar-inner" style="border: none; background: #999; border: 
none; border-radius: 0px;">
+        <div class="container">
+          <button type="button" class="btn btn-navbar" data-toggle="collapse" 
data-target=".nav-collapse">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <!-- <a class="brand" href="#">Apache Community Development 
Project</a> -->
+          <div class="nav-collapse collapse">
+            <ul class="nav">
+             <!-- <li><a href="/">Home</a></li> --> 
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">General<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/general/downloads.html">Downloads</a>
+                  <li><a href="/general/who-we-are.html">Who we are</a>
+                  <li><a 
href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a>
+                  <li><a href="/general/release-notes.html">Release Notes</a> 
+                  <li><a href="/general/books-tutorials-and-talks.html">Books, 
Tutorials, Talks</a></li>
+                  <li><a href="/general/powered-by-mahout.html">Powered By 
Mahout</a>
+                  <li><a 
href="/general/professional-support.html">Professional Support</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Resources</li>
+                  <li><a href="/general/reference-reading.html">Reference 
Reading</a>
+                  <li><a href="/general/faq.html">FAQ</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Legal</li>
+                  <li><a 
href="http://www.apache.org/licenses/";>License</a></li>
+                  <li><a 
href="http://www.apache.org/security/";>Security</a></li>
+                  <li><a href="/general/privacy-policy.html">Privacy Policy</a>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Developers<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/developers/developer-resources.html">Developer 
resources</a></li>
+                  <li><a href="/developers/version-control.html">Version 
control</a></li>
+                  <li><a href="/developers/buildingmahout.html">Build from 
source</a></li>
+                  <li><a href="/developers/issue-tracker.html">Issue 
tracker</a></li>
+                  <li><a href="https://builds.apache.org/job/Mahout-Quality/"; 
target="_blank">Code quality reports</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Contributions</li>
+                  <li><a href="/developers/how-to-contribute.html">How to 
contribute</a></li>
+                  <li><a href="/developers/how-to-become-a-committer.html">How 
to become a committer</a></li>
+                  <li><a href="/developers/gsoc.html">GSoC</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">For committers</li>
+                  <li><a href="/developers/how-to-update-the-website.html">How 
to update the website</a></li>
+                  <li><a href="/developers/patch-check-list.html">Patch check 
list</a></li>
+                  <li><a href="/developers/github.html">Handling Github 
PRs</a></li>
+                  <li><a href="/developers/how-to-release.html">How to 
release</a></li>
+                  <li><a href="/developers/thirdparty-dependencies.html">Third 
party dependencies</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Mahout Environment<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/sparkbindings/home.html">Scala &amp; 
Spark Bindings Overview</a></li>
+                  <li><a href="/users/sparkbindings/faq.html">FAQ</a></li>
+                  <li class="nav-header">Engines</li>
+                  <li><a href="/users/sparkbindings/home.html">Spark</a></li>
+                  <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li class="nav-header">Tutorials</li>
+                  <li><a 
href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark 
Shell</a></li>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Algorithms<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of 
algorithms</a>
+                  <li class="nav-header">Distributed Matrix Decomposition</li>
+                  <li><a href="/users/algorithms/d-qr.html">Cholesky 
QR</a></li>
+                  <li><a href="/users/algorithms/d-ssvd.html">SSVD</a></li>
+                  <li class="nav-header">Recommendations</li>
+                  <li><a 
href="/users/algorithms/recommender-overview.html">Recommender Overview</a></li>
+                  <li><a 
href="/users/algorithms/intro-cooccurrence-spark.html">Intro to 
cooccurrence-based<br/> recommendations with Spark</a></li>
+                  <li class="nav-header">Classification</li>
+                  <li><a href="/users/algorithms/spark-naive-bayes.html">Spark 
Naive Bayes</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">MapReduce Basics<b class="caret"></b></a>
+                 <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of 
algorithms</a>
+                  <li><a href="/users/basics/quickstart.html">Overview</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Working with text</li>
+                  <li><a 
href="/users/basics/creating-vectors-from-text.html">Creating vectors from 
text</a>
+                  <li><a 
href="/users/basics/collocations.html">Collocations</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Dimensionality reduction</li>
+                  <li><a 
href="/users/dim-reduction/dimensional-reduction.html">Singular Value 
Decomposition</a></li>
+                  <li><a href="/users/dim-reduction/ssvd.html">Stochastic 
SVD</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Topic Models</li>      
+                  <li><a 
href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet 
Allocation</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Mahout MapReduce<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li class="nav-header">Classification</li>
+                  <li><a href="/users/classification/bayesian.html">Naive 
Bayes</a></li>
+                  <li><a 
href="/users/classification/hidden-markov-models.html">Hidden Markov 
Models</a></li>
+                  <li><a 
href="/users/classification/logistic-regression.html">Logistic 
Regression</a></li>
+                  <li><a 
href="/users/classification/partial-implementation.html">Random Forest</a></li>
+                  <li class="nav-header">Classification Examples</li>
+                  <li><a 
href="/users/classification/breiman-example.html">Breiman example</a></li>
+                  <li><a 
href="/users/classification/twenty-newsgroups.html">20 newsgroups 
example</a></li>
+                  <li><a 
href="/users/classification/bankmarketing-example.html">SGD classifier bank 
marketing</a></li>
+                  <li class="nav-header">Clustering</li>
+                  <li><a 
href="/users/clustering/k-means-clustering.html">k-Means</a></li>
+                  <li><a 
href="/users/clustering/canopy-clustering.html">Canopy</a></li>
+                  <li><a href="/users/clustering/fuzzy-k-means.html">Fuzzy 
k-Means</a></li>
+                  <li><a 
href="/users/clustering/streaming-k-means.html">Streaming KMeans</a></li>
+                  <li><a 
href="/users/clustering/spectral-clustering.html">Spectral Clustering</a></li>
+                  <li class="nav-header">Clustering Commandline usage</li>
+                  <li><a 
href="/users/clustering/k-means-commandline.html">Options for k-Means</a></li>
+                  <li><a 
href="/users/clustering/canopy-commandline.html">Options for Canopy</a></li>
+                  <li><a 
href="/users/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy 
k-Means</a></li>
+                  <li class="nav-header">Clustering Examples</li>
+                  <li><a 
href="/users/clustering/clustering-of-synthetic-control-data.html">Synthetic 
data</a></li>
+                  <li class="nav-header">Cluster Post processing</li>
+                  <li><a href="/users/clustering/cluster-dumper.html">Cluster 
Dumper tool</a></li>
+                  <li><a 
href="/users/clustering/visualizing-sample-clusters.html">Cluster 
visualisation</a></li>
+                  <li class="nav-header">Recommendations</li>
+                  <li><a 
href="/users/recommender/recommender-first-timer-faq.html">First Timer 
FAQ</a></li>
+                  <li><a href="/users/recommender/userbased-5-minutes.html">A 
user-based recommender <br/>in 5 minutes</a></li>
+                 <li><a 
href="/users/recommender/matrix-factorization.html">Matrix 
factorization-based<br/> recommenders</a></li>
+                  <li><a 
href="/users/recommender/recommender-documentation.html">Overview</a></li>
+                  <li><a 
href="/users/recommender/intro-itembased-hadoop.html">Intro to item-based 
recommendations<br/> with Hadoop</a></li>
+                  <li><a href="/users/recommender/intro-als-hadoop.html">Intro 
to ALS recommendations<br/> with Hadoop</a></li>
+               </ul>
+              </li>
+              <!--  <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Recommendations<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                
+                </ul> -->
+            </li>
+           </ul>
+          </div><!--/.nav-collapse -->
+        </div>
+      </div>
+    </div>
+
+</div>
+
+ <div id="sidebar">
+  <div id="sidebar-wrap">
+    <h2>Twitter</h2>
+       <ul class="sidemenu">
+               <li>
+<a class="twitter-timeline" href="https://twitter.com/ApacheMahout"; 
data-widget-id="422861673444028416">Tweets by @ApacheMahout</a>
+<script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
+</li>
+       </ul>
+    <h2>Apache Software Foundation</h2>
+    <ul class="sidemenu">
+      <li><a href="http://www.apache.org/foundation/how-it-works.html";>How the 
ASF works</a></li>
+      <li><a href="http://www.apache.org/foundation/getinvolved.html";>Get 
Involved</a></li>
+      <li><a href="http://www.apache.org/dev/";>Developer Resources</a></li>
+      <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+      <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+    </ul>
+    <h2>Related Projects</h2>
+    <ul class="sidemenu">
+      <li><a href="http://lucene.apache.org/";>Lucene</a></li>
+      <li><a href="http://hadoop.apache.org/";>Hadoop</a></li>
+    </ul>
+  </div>
+</div>
+
+  <div id="content-wrap" class="clearfix">
+   <div id="main">
+    <h1 id="introduction">Introduction</h1>
+<p>This document provides an overview of how the Mahout Scala DSL (distributed 
algebraic operators) is implemented over the H2O backend engine. The document 
is aimed at Mahout developers, to give a high level description of the design 
so that one can explore the code inside <code>h2o/</code> with some context.</p>
+<h2 id="h2o-overview"><a href="http://h2o.ai/";>H2O</a> Overview</h2>
+<p>H2O is a distributed scalable machine learning system. Internal 
architecture of H2O has a distributed math engine (h2o-core) and a separate 
layer on top for algorithms and UI. The Mahout integration requires only the 
math engine (h2o-core).</p>
+<h2 id="h2o-data-model">H2O Data Model</h2>
+<p>The data model of the H2O math engine is a distributed columnar store (of 
primarily numbers, but also strings). A column of numbers is called a Vector, 
which is broken into Chunks (of a few thousand elements). Chunks are 
distributed across the cluster based on a deterministic hash. Therefore, any 
member of the cluster knows where a particular Chunk of a Vector is homed. Each 
Chunk is separately compressed in memory and elements are individually 
decompressed on the fly upon access with purely register operations (thereby 
achieving high memory throughput). An ordered set of similarly partitioned Vecs 
are composed into a Frame. A Frame is therefore a large two dimensional table 
of numbers. All elements of a logical row in the Frame are guaranteed to be 
homed in the same server of the cluster. Generally speaking, H2O works well on 
"tall skinny" data, i.e, lots of rows (100s of millions) and modest number of 
columns (10s of thousands).</p>
+<h2 id="mahout-drm">Mahout DRM</h2>
+<p>The Mahout DRM, or Distributed Row Matrix, is an abstraction for storing a 
large matrix of numbers in-memory in a cluster by distributing logical rows 
among servers. The DSL provides an abstract API on DRMs for backend engines to 
provide implementations of this API. Examples are the Spark and H2O backend 
engines. Each engine has it's own design of mapping the abstract API onto its 
data model and provides implementations for algebraic operators over that 
mapping.</p>
+<h2 id="h2o-dsl-engine">H2O DSL Engine</h2>
+<p>The H2O backend implements the abstract DRM as an H2O Frame. Each logical 
column in the DRM is an H2O Vector. All elements of a logical DRM row are 
guaranteed to be homed on the same server. A set of rows stored on a server are 
presented as a read-only virtual in-core Matrix (i.e BlockMatrix) for the 
closure method in the <code>mapBlock(...)</code> API.</p>
+<p>H2O provides a flexible execution framework called <code>MRTask</code>. The 
<code>MRTask</code> framework typically executes over a Frame (or even a 
Vector), supports various types of map() methods, can optionally modify the 
Frame or Vector (though this never happens in the Mahout integration), and 
optionally create a new Vector or set of Vectors (to combine them into a new 
Frame, and consequently a new DRM).</p>
+<h2 id="source-layout">Source Layout</h2>
+<p>Within mahout.git, the top level directory, <code>h2o/</code> holds all the 
source code related to the H2O backend engine. Part of the code (that 
interfaces with the rest of the Mahout componenets) is in Scala, and part of 
the code (that interfaces with h2o-core and implements algebraic operators) is 
in Java. Here is a brief overview of what functionality can be found where 
within <code>h2o/</code>.</p>
+<p>h2o/ - top level directory containing all H2O related code</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/ops/*.java - Physical 
operator code for the various DSL algebra</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/drm/*.java - DRM backing 
(onto Frame) and Broadcast implementation</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/H2OHdfs.java - Read / Write 
between DRM (Frame) and files on HDFS</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/H2OBlockMatrix.java - A 
vertical block matrix of DRM presented as a virtual copy-on-write in-core 
Matrix. Used in mapBlock() API</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/H2OHelper.java - A 
collection of various functionality and helpers. For e.g, convert between 
in-core Matrix and DRM, various summary statistics on DRM/Frame.</p>
+<p>h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala - DSL 
operator graph evaluator and various abstract API implementations for a 
distributed engine</p>
+<p>h2o/src/main/scala/org/apache/mahout/h2obindings/* - Various abstract API 
implementations ("glue work")</p>
+   </div>
+  </div>     
+</div> 
+  <footer class="footer" align="center">
+    <div class="container">
+      <p>
+        Copyright &copy; 2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache and the Apache feather logos are trademarks of The Apache 
Software Foundation.
+      </p>
+    </div>
+  </footer>
+  
+  <script src="/js/jquery-1.9.1.min.js"></script>
+  <script src="/js/bootstrap.min.js"></script>
+  <script>
+    (function() {
+      var cx = '012254517474945470291:vhsfv7eokdc';
+      var gcse = document.createElement('script');
+      gcse.type = 'text/javascript';
+      gcse.async = true;
+      gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') 
+
+          '//www.google.com/cse/cse.js?cx=' + cx;
+      var s = document.getElementsByTagName('script')[0];
+      s.parentNode.insertBefore(gcse, s);
+    })();
+  </script>
+</body>
+</html>

Added: 
websites/staging/mahout/trunk/content/users/environment/spark-internals.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/environment/spark-internals.html 
(added)
+++ 
websites/staging/mahout/trunk/content/users/environment/spark-internals.html 
Fri Apr  3 22:41:53 2015
@@ -0,0 +1,294 @@
+<!DOCTYPE html>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en"><head><meta 
http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <title>Apache Mahout: Scalable machine learning and data mining</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="Distribution" content="Global">
+  <meta name="Robots" content="index,follow">
+  <meta name="keywords" content="apache, apache hadoop, apache lucene,
+        business data mining, cluster analysis,
+        collaborative filtering, data extraction, data filtering, data 
framework, data integration,
+        data matching, data mining, data mining algorithms, data mining 
analysis, data mining data,
+        data mining introduction, data mining software,
+        data mining techniques, data representation, data set, datamining,
+        feature extraction, fuzzy k means, genetic algorithm, hadoop,
+        hierarchical clustering, high dimensional, introduction to data 
mining, kmeans,
+        knowledge discovery, learning approach, learning approaches, learning 
methods,
+        learning techniques, lucene, machine learning, machine translation, 
mahout apache,
+        mahout taste, map reduce hadoop, mining data, mining methods, naive 
bayes,
+        natural language processing,
+        supervised, text mining, time series data, unsupervised, web data 
mining">
+  <link rel="shortcut icon" type="image/x-icon" 
href="http://mahout.apache.org/images/favicon.ico";>
+  <script type="text/javascript" src="/js/prototype.js"></script>
+  <script type="text/javascript" src="/js/effects.js"></script>
+  <script type="text/javascript" src="/js/search.js"></script>
+  <script type="text/javascript" src="/js/slides.js"></script>
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-responsive.css" rel="stylesheet">
+  <link rel="stylesheet" href="/css/global.css" type="text/css">
+
+  <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown -->
+  <script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    tex2jax: {
+      skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+    }
+  });
+  MathJax.Hub.Queue(function() {
+    var all = MathJax.Hub.getAllJax(), i;
+    for(i = 0; i < all.length; i += 1) {
+      all[i].SourceElement().parentNode.className += ' has-jax';
+    }
+  });
+  </script>
+  <script type="text/javascript">
+    var mathjax = document.createElement('script'); 
+    mathjax.type = 'text/javascript'; 
+    mathjax.async = true;
+
+    mathjax.src = ('https:' == document.location.protocol) ?
+        
'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
 : 
+        
'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+       
+         var s = document.getElementsByTagName('script')[0]; 
+    s.parentNode.insertBefore(mathjax, s);
+  </script>
+</head>
+
+<body id="home" data-twttr-rendered="true">
+  <div id="wrap">
+   <div id="header">
+    <div id="logo"><a href="/overview.html"></a></div>
+  <div id="search">
+    <form id="search-form" action="http://www.google.com/search"; method="get" 
class="navbar-search pull-right">    
+      <input value="http://mahout.apache.org"; name="sitesearch" type="hidden">
+      <input class="search-query" name="q" id="query" type="text">
+      <input id="submission" type="image" src="/images/mahout-lupe.png" 
alt="Search" />
+    </form>
+  </div>
+
+    <div class="navbar navbar-inverse" 
style="position:absolute;top:133px;padding-right:0px;padding-left:0px;">
+      <div class="navbar-inner" style="border: none; background: #999; border: 
none; border-radius: 0px;">
+        <div class="container">
+          <button type="button" class="btn btn-navbar" data-toggle="collapse" 
data-target=".nav-collapse">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <!-- <a class="brand" href="#">Apache Community Development 
Project</a> -->
+          <div class="nav-collapse collapse">
+            <ul class="nav">
+             <!-- <li><a href="/">Home</a></li> --> 
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">General<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/general/downloads.html">Downloads</a>
+                  <li><a href="/general/who-we-are.html">Who we are</a>
+                  <li><a 
href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a>
+                  <li><a href="/general/release-notes.html">Release Notes</a> 
+                  <li><a href="/general/books-tutorials-and-talks.html">Books, 
Tutorials, Talks</a></li>
+                  <li><a href="/general/powered-by-mahout.html">Powered By 
Mahout</a>
+                  <li><a 
href="/general/professional-support.html">Professional Support</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Resources</li>
+                  <li><a href="/general/reference-reading.html">Reference 
Reading</a>
+                  <li><a href="/general/faq.html">FAQ</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Legal</li>
+                  <li><a 
href="http://www.apache.org/licenses/";>License</a></li>
+                  <li><a 
href="http://www.apache.org/security/";>Security</a></li>
+                  <li><a href="/general/privacy-policy.html">Privacy Policy</a>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Developers<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/developers/developer-resources.html">Developer 
resources</a></li>
+                  <li><a href="/developers/version-control.html">Version 
control</a></li>
+                  <li><a href="/developers/buildingmahout.html">Build from 
source</a></li>
+                  <li><a href="/developers/issue-tracker.html">Issue 
tracker</a></li>
+                  <li><a href="https://builds.apache.org/job/Mahout-Quality/"; 
target="_blank">Code quality reports</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Contributions</li>
+                  <li><a href="/developers/how-to-contribute.html">How to 
contribute</a></li>
+                  <li><a href="/developers/how-to-become-a-committer.html">How 
to become a committer</a></li>
+                  <li><a href="/developers/gsoc.html">GSoC</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">For committers</li>
+                  <li><a href="/developers/how-to-update-the-website.html">How 
to update the website</a></li>
+                  <li><a href="/developers/patch-check-list.html">Patch check 
list</a></li>
+                  <li><a href="/developers/github.html">Handling Github 
PRs</a></li>
+                  <li><a href="/developers/how-to-release.html">How to 
release</a></li>
+                  <li><a href="/developers/thirdparty-dependencies.html">Third 
party dependencies</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Mahout Environment<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/sparkbindings/home.html">Scala &amp; 
Spark Bindings Overview</a></li>
+                  <li><a href="/users/sparkbindings/faq.html">FAQ</a></li>
+                  <li class="nav-header">Engines</li>
+                  <li><a href="/users/sparkbindings/home.html">Spark</a></li>
+                  <li><a 
href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li class="nav-header">Tutorials</li>
+                  <li><a 
href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark 
Shell</a></li>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Algorithms<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of 
algorithms</a>
+                  <li class="nav-header">Distributed Matrix Decomposition</li>
+                  <li><a href="/users/algorithms/d-qr.html">Cholesky 
QR</a></li>
+                  <li><a href="/users/algorithms/d-ssvd.html">SSVD</a></li>
+                  <li class="nav-header">Recommendations</li>
+                  <li><a 
href="/users/algorithms/recommender-overview.html">Recommender Overview</a></li>
+                  <li><a 
href="/users/algorithms/intro-cooccurrence-spark.html">Intro to 
cooccurrence-based<br/> recommendations with Spark</a></li>
+                  <li class="nav-header">Classification</li>
+                  <li><a href="/users/algorithms/spark-naive-bayes.html">Spark 
Naive Bayes</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">MapReduce Basics<b class="caret"></b></a>
+                 <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of 
algorithms</a>
+                  <li><a href="/users/basics/quickstart.html">Overview</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Working with text</li>
+                  <li><a 
href="/users/basics/creating-vectors-from-text.html">Creating vectors from 
text</a>
+                  <li><a 
href="/users/basics/collocations.html">Collocations</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Dimensionality reduction</li>
+                  <li><a 
href="/users/dim-reduction/dimensional-reduction.html">Singular Value 
Decomposition</a></li>
+                  <li><a href="/users/dim-reduction/ssvd.html">Stochastic 
SVD</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Topic Models</li>      
+                  <li><a 
href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet 
Allocation</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Mahout MapReduce<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li class="nav-header">Classification</li>
+                  <li><a href="/users/classification/bayesian.html">Naive 
Bayes</a></li>
+                  <li><a 
href="/users/classification/hidden-markov-models.html">Hidden Markov 
Models</a></li>
+                  <li><a 
href="/users/classification/logistic-regression.html">Logistic 
Regression</a></li>
+                  <li><a 
href="/users/classification/partial-implementation.html">Random Forest</a></li>
+                  <li class="nav-header">Classification Examples</li>
+                  <li><a 
href="/users/classification/breiman-example.html">Breiman example</a></li>
+                  <li><a 
href="/users/classification/twenty-newsgroups.html">20 newsgroups 
example</a></li>
+                  <li><a 
href="/users/classification/bankmarketing-example.html">SGD classifier bank 
marketing</a></li>
+                  <li class="nav-header">Clustering</li>
+                  <li><a 
href="/users/clustering/k-means-clustering.html">k-Means</a></li>
+                  <li><a 
href="/users/clustering/canopy-clustering.html">Canopy</a></li>
+                  <li><a href="/users/clustering/fuzzy-k-means.html">Fuzzy 
k-Means</a></li>
+                  <li><a 
href="/users/clustering/streaming-k-means.html">Streaming KMeans</a></li>
+                  <li><a 
href="/users/clustering/spectral-clustering.html">Spectral Clustering</a></li>
+                  <li class="nav-header">Clustering Commandline usage</li>
+                  <li><a 
href="/users/clustering/k-means-commandline.html">Options for k-Means</a></li>
+                  <li><a 
href="/users/clustering/canopy-commandline.html">Options for Canopy</a></li>
+                  <li><a 
href="/users/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy 
k-Means</a></li>
+                  <li class="nav-header">Clustering Examples</li>
+                  <li><a 
href="/users/clustering/clustering-of-synthetic-control-data.html">Synthetic 
data</a></li>
+                  <li class="nav-header">Cluster Post processing</li>
+                  <li><a href="/users/clustering/cluster-dumper.html">Cluster 
Dumper tool</a></li>
+                  <li><a 
href="/users/clustering/visualizing-sample-clusters.html">Cluster 
visualisation</a></li>
+                  <li class="nav-header">Recommendations</li>
+                  <li><a 
href="/users/recommender/recommender-first-timer-faq.html">First Timer 
FAQ</a></li>
+                  <li><a href="/users/recommender/userbased-5-minutes.html">A 
user-based recommender <br/>in 5 minutes</a></li>
+                 <li><a 
href="/users/recommender/matrix-factorization.html">Matrix 
factorization-based<br/> recommenders</a></li>
+                  <li><a 
href="/users/recommender/recommender-documentation.html">Overview</a></li>
+                  <li><a 
href="/users/recommender/intro-itembased-hadoop.html">Intro to item-based 
recommendations<br/> with Hadoop</a></li>
+                  <li><a href="/users/recommender/intro-als-hadoop.html">Intro 
to ALS recommendations<br/> with Hadoop</a></li>
+               </ul>
+              </li>
+              <!--  <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Recommendations<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                
+                </ul> -->
+            </li>
+           </ul>
+          </div><!--/.nav-collapse -->
+        </div>
+      </div>
+    </div>
+
+</div>
+
+ <div id="sidebar">
+  <div id="sidebar-wrap">
+    <h2>Twitter</h2>
+       <ul class="sidemenu">
+               <li>
+<a class="twitter-timeline" href="https://twitter.com/ApacheMahout"; 
data-widget-id="422861673444028416">Tweets by @ApacheMahout</a>
+<script>!function(d,s,id){var 
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
+</li>
+       </ul>
+    <h2>Apache Software Foundation</h2>
+    <ul class="sidemenu">
+      <li><a href="http://www.apache.org/foundation/how-it-works.html";>How the 
ASF works</a></li>
+      <li><a href="http://www.apache.org/foundation/getinvolved.html";>Get 
Involved</a></li>
+      <li><a href="http://www.apache.org/dev/";>Developer Resources</a></li>
+      <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+      <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+    </ul>
+    <h2>Related Projects</h2>
+    <ul class="sidemenu">
+      <li><a href="http://lucene.apache.org/";>Lucene</a></li>
+      <li><a href="http://hadoop.apache.org/";>Hadoop</a></li>
+    </ul>
+  </div>
+</div>
+
+  <div id="content-wrap" class="clearfix">
+   <div id="main">
+    <h1 id="introduction">Introduction</h1>
+<p>This document provides an overview of how the Mahout Scala DSL (distributed 
algebraic operators) is implemented over the Spark back end engine. The 
document is aimed at Mahout developers, to give a high level description of the 
design. </p>
+<h2 id="spark-overview">Spark Overview</h2>
+<h2 id="spark-data-model">Spark Data Model</h2>
+<h2 id="mahout-drm">Mahout DRM</h2>
+<p>Mahout DRM, or Distributed Row Matrix, is an abstraction for storing a 
large matrix of numbers in-memory in a cluster by distributing logical rows 
among servers. The DSL provides an abstract API on DRMs for backend engines to 
provide implementations of this API. Examples are Spark and H2O backend 
engines. Each engine has its own design of mapping the abstract API onto its 
data model and provide implementations for algebraic operators over that 
mapping.</p>
+<h2 id="spark-dsl-engine">Spark DSL Engine</h2>
+<h2 id="source-layout">Source Layout</h2>
+   </div>
+  </div>     
+</div> 
+  <footer class="footer" align="center">
+    <div class="container">
+      <p>
+        Copyright &copy; 2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache and the Apache feather logos are trademarks of The Apache 
Software Foundation.
+      </p>
+    </div>
+  </footer>
+  
+  <script src="/js/jquery-1.9.1.min.js"></script>
+  <script src="/js/bootstrap.min.js"></script>
+  <script>
+    (function() {
+      var cx = '012254517474945470291:vhsfv7eokdc';
+      var gcse = document.createElement('script');
+      gcse.type = 'text/javascript';
+      gcse.async = true;
+      gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') 
+
+          '//www.google.com/cse/cse.js?cx=' + cx;
+      var s = document.getElementsByTagName('script')[0];
+      s.parentNode.insertBefore(gcse, s);
+    })();
+  </script>
+</body>
+</html>


Reply via email to