Author: buildbot
Date: Fri Apr  3 23:14:51 2015
New Revision: 946256

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/environment/h2o-internals.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Apr  3 23:14:51 2015
@@ -1 +1 @@
-1671214
+1671216

Modified: 
websites/staging/mahout/trunk/content/users/environment/h2o-internals.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/environment/h2o-internals.html 
(original)
+++ websites/staging/mahout/trunk/content/users/environment/h2o-internals.html 
Fri Apr  3 23:14:51 2015
@@ -256,7 +256,7 @@
    <div id="main">
     <h1 id="introduction">Introduction</h1>
 <p>This document provides an overview of how the Mahout Scala DSL (distributed 
algebraic operators) is implemented over the H2O backend engine. The document 
is aimed at Mahout developers, to give a high level description of the design 
so that one can explore the code inside <code>h2o/</code> with some context.</p>
-<h2 id="h2o-overview"><a href="http://h2o.ai/";>H2O</a> Overview</h2>
+<h2 id="h2o-overview">H2O Overview</h2>
 <p>H2O is a distributed scalable machine learning system. Internal 
architecture of H2O has a distributed math engine (h2o-core) and a separate 
layer on top for algorithms and UI. The Mahout integration requires only the 
math engine (h2o-core).</p>
 <h2 id="h2o-data-model">H2O Data Model</h2>
 <p>The data model of the H2O math engine is a distributed columnar store (of 
primarily numbers, but also strings). A column of numbers is called a Vector, 
which is broken into Chunks (of a few thousand elements). Chunks are 
distributed across the cluster based on a deterministic hash. Therefore, any 
member of the cluster knows where a particular Chunk of a Vector is homed. Each 
Chunk is separately compressed in memory and elements are individually 
decompressed on the fly upon access with purely register operations (thereby 
achieving high memory throughput). An ordered set of similarly partitioned Vecs 
are composed into a Frame. A Frame is therefore a large two dimensional table 
of numbers. All elements of a logical row in the Frame are guaranteed to be 
homed in the same server of the cluster. Generally speaking, H2O works well on 
"tall skinny" data, i.e, lots of rows (100s of millions) and modest number of 
columns (10s of thousands).</p>
@@ -267,22 +267,14 @@
 <p>H2O provides a flexible execution framework called <code>MRTask</code>. The 
<code>MRTask</code> framework typically executes over a Frame (or even a 
Vector), supports various types of map() methods, can optionally modify the 
Frame or Vector (though this never happens in the Mahout integration), and 
optionally create a new Vector or set of Vectors (to combine them into a new 
Frame, and consequently a new DRM).</p>
 <h2 id="source-layout">Source Layout</h2>
 <p>Within mahout.git, the top level directory, <code>h2o/</code> holds all the 
source code related to the H2O backend engine. Part of the code (that 
interfaces with the rest of the Mahout componenets) is in Scala, and part of 
the code (that interfaces with h2o-core and implements algebraic operators) is 
in Java. Here is a brief overview of what functionality can be found where 
within <code>h2o/</code>.</p>
-<div class="codehilite"><pre><span class="n">h2o</span><span 
class="o">/</span> <span class="o">-</span> <span class="n">top</span> <span 
class="n">level</span> <span class="n">directory</span> <span 
class="n">containing</span> <span class="n">all</span> <span 
class="n">H2O</span> <span class="n">related</span> <span class="n">code</span>
-
-<span class="n">h2o</span><span class="o">/</span><span 
class="n">src</span><span class="o">/</span><span class="n">main</span><span 
class="o">/</span><span class="n">java</span><span class="o">/</span><span 
class="n">org</span><span class="o">/</span><span class="n">apache</span><span 
class="o">/</span><span class="n">mahout</span><span class="o">/</span><span 
class="n">h2obindings</span><span class="o">/</span><span 
class="n">ops</span><span class="o">/*</span><span class="p">.</span><span 
class="n">java</span> <span class="o">-</span> <span class="n">Physical</span> 
<span class="n">operator</span> <span class="n">code</span> <span 
class="k">for</span> <span class="n">the</span> <span class="n">various</span> 
<span class="n">DSL</span> <span class="n">algebra</span>
-
-<span class="n">h2o</span><span class="o">/</span><span 
class="n">src</span><span class="o">/</span><span class="n">main</span><span 
class="o">/</span><span class="n">java</span><span class="o">/</span><span 
class="n">org</span><span class="o">/</span><span class="n">apache</span><span 
class="o">/</span><span class="n">mahout</span><span class="o">/</span><span 
class="n">h2obindings</span><span class="o">/</span><span 
class="n">drm</span><span class="o">/*</span><span class="p">.</span><span 
class="n">java</span> <span class="o">-</span> <span class="n">DRM</span> <span 
class="n">backing</span> <span class="p">(</span><span class="n">onto</span> 
<span class="n">Frame</span><span class="p">)</span> <span class="n">and</span> 
<span class="n">Broadcast</span> <span class="n">implementation</span>
-
-<span class="n">h2o</span><span class="o">/</span><span 
class="n">src</span><span class="o">/</span><span class="n">main</span><span 
class="o">/</span><span class="n">java</span><span class="o">/</span><span 
class="n">org</span><span class="o">/</span><span class="n">apache</span><span 
class="o">/</span><span class="n">mahout</span><span class="o">/</span><span 
class="n">h2obindings</span><span class="o">/</span><span 
class="n">H2OHdfs</span><span class="p">.</span><span class="n">java</span> 
<span class="o">-</span> <span class="n">Read</span> <span class="o">/</span> 
<span class="n">Write</span> <span class="n">between</span> <span 
class="n">DRM</span> <span class="p">(</span><span class="n">Frame</span><span 
class="p">)</span> <span class="n">and</span> <span class="n">files</span> 
<span class="n">on</span> <span class="n">HDFS</span>
-
-<span class="n">h2o</span><span class="o">/</span><span 
class="n">src</span><span class="o">/</span><span class="n">main</span><span 
class="o">/</span><span class="n">java</span><span class="o">/</span><span 
class="n">org</span><span class="o">/</span><span class="n">apache</span><span 
class="o">/</span><span class="n">mahout</span><span class="o">/</span><span 
class="n">h2obindings</span><span class="o">/</span><span 
class="n">H2OBlockMatrix</span><span class="p">.</span><span 
class="n">java</span> <span class="o">-</span> <span class="n">A</span> <span 
class="n">vertical</span> <span class="n">block</span> <span 
class="n">matrix</span> <span class="n">of</span> <span class="n">DRM</span> 
<span class="n">presented</span> <span class="n">as</span> <span 
class="n">a</span> <span class="n">virtual</span> <span 
class="n">copy</span><span class="o">-</span><span class="n">on</span><span 
class="o">-</span><span class="n">write</span> <span class="n">in</span><span 
class="o">-</span><span
  class="n">core</span> <span class="n">Matrix</span><span class="p">.</span> 
<span class="n">Used</span> <span class="n">in</span> <span 
class="n">mapBlock</span><span class="p">()</span> <span class="n">API</span>
-
-<span class="n">h2o</span><span class="o">/</span><span 
class="n">src</span><span class="o">/</span><span class="n">main</span><span 
class="o">/</span><span class="n">java</span><span class="o">/</span><span 
class="n">org</span><span class="o">/</span><span class="n">apache</span><span 
class="o">/</span><span class="n">mahout</span><span class="o">/</span><span 
class="n">h2obindings</span><span class="o">/</span><span 
class="n">H2OHelper</span><span class="p">.</span><span class="n">java</span> 
<span class="o">-</span> <span class="n">A</span> <span 
class="n">collection</span> <span class="n">of</span> <span 
class="n">various</span> <span class="n">functionality</span> <span 
class="n">and</span> <span class="n">helpers</span><span class="p">.</span> 
<span class="n">For</span> <span class="n">e</span><span 
class="p">.</span><span class="n">g</span><span class="p">,</span> <span 
class="n">convert</span> <span class="n">between</span> <span 
class="n">in</span><span class="o">-</span><s
 pan class="n">core</span> <span class="n">Matrix</span> <span 
class="n">and</span> <span class="n">DRM</span><span class="p">,</span> <span 
class="n">various</span> <span class="n">summary</span> <span 
class="n">statistics</span> <span class="n">on</span> <span 
class="n">DRM</span><span class="o">/</span><span class="n">Frame</span><span 
class="p">.</span>
-
-<span class="n">h2o</span><span class="o">/</span><span 
class="n">src</span><span class="o">/</span><span class="n">main</span><span 
class="o">/</span><span class="n">scala</span><span class="o">/</span><span 
class="n">org</span><span class="o">/</span><span class="n">apache</span><span 
class="o">/</span><span class="n">mahout</span><span class="o">/</span><span 
class="n">h2obindings</span><span class="o">/</span><span 
class="n">H2OEngine</span><span class="p">.</span><span class="n">scala</span> 
<span class="o">-</span> <span class="n">DSL</span> <span 
class="n">operator</span> <span class="n">graph</span> <span 
class="n">evaluator</span> <span class="n">and</span> <span 
class="n">various</span> <span class="n">abstract</span> <span 
class="n">API</span> <span class="n">implementations</span> <span 
class="k">for</span> <span class="n">a</span> <span 
class="n">distributed</span> <span class="n">engine</span>
-
-<span class="n">h2o</span><span class="o">/</span><span 
class="n">src</span><span class="o">/</span><span class="n">main</span><span 
class="o">/</span><span class="n">scala</span><span class="o">/</span><span 
class="n">org</span><span class="o">/</span><span class="n">apache</span><span 
class="o">/</span><span class="n">mahout</span><span class="o">/</span><span 
class="n">h2obindings</span><span class="o">/*</span> <span class="o">-</span> 
<span class="n">Various</span> <span class="n">abstract</span> <span 
class="n">API</span> <span class="n">implementations</span> <span 
class="p">(</span>&quot;<span class="n">glue</span> <span 
class="n">work</span>&quot;<span class="p">)</span>
-</pre></div>
+<p>h2o/ - top level directory containing all H2O related code</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/ops/*.java - Physical 
operator code for the various DSL algebra</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/drm/*.java - DRM backing 
(onto Frame) and Broadcast implementation</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/H2OHdfs.java - Read / Write 
between DRM (Frame) and files on HDFS</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/H2OBlockMatrix.java - A 
vertical block matrix of DRM presented as a virtual copy-on-write in-core 
Matrix. Used in mapBlock() API</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/H2OHelper.java - A 
collection of various functionality and helpers. For e.g, convert between 
in-core Matrix and DRM, various summary statistics on DRM/Frame.</p>
+<p>h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala - DSL 
operator graph evaluator and various abstract API implementations for a 
distributed engine</p>
+<p>h2o/src/main/scala/org/apache/mahout/h2obindings/* - Various abstract API 
implementations ("glue work")</p>
    </div>
   </div>     
 </div> 


Reply via email to