Author: buildbot
Date: Sat Jun 13 00:38:21 2015
New Revision: 954652

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    
websites/staging/mahout/trunk/content/users/sparkbindings/play-with-shell.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sat Jun 13 00:38:21 2015
@@ -1 +1 @@
-1683660
+1685198

Modified: 
websites/staging/mahout/trunk/content/users/sparkbindings/play-with-shell.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/sparkbindings/play-with-shell.html 
(original)
+++ 
websites/staging/mahout/trunk/content/users/sparkbindings/play-with-shell.html 
Sat Jun 13 00:38:21 2015
@@ -264,6 +264,7 @@
    <div id="main">
     <h1 id="playing-with-mahouts-spark-shell">Playing with Mahout's Spark 
Shell</h1>
 <p>This tutorial will show you how to play with Mahout's scala DSL for linear 
algebra and its Spark shell. <strong>Please keep in mind that this code is 
still in a very early experimental stage</strong>.</p>
+<p><em>(Edited for 0.10.2)</em></p>
 <h2 id="intro">Intro</h2>
 <p>We'll use an excerpt of a publicly available <a 
href="http://lib.stat.cmu.edu/DASL/Datafiles/Cereals.html";>dataset about 
cereals</a>. The dataset tells the protein, fat, carbohydrate and sugars (in 
milligrams) contained in a set of cereals, as well as a customer rating for the 
cereals. Our aim for this example is to fit a linear model which infers the 
customer rating from the ingredients.</p>
 <table>
@@ -474,23 +475,11 @@ that, our model always crosses through t
 right angle. An easy way to add such a bias term to our model is to add a 
 column of ones to the feature matrix <code>\(\mathbf{X}\)</code>. 
 The corresponding weight in the parameter vector will then be the bias 
term.</p>
-<p>Mahout's DSL offers a <code>mapBlock()</code> method for custom 
modifications of a DRM. All the rows in a partition are merged to a block of 
the matrix which is given to custom code in a closure. For our example, we 
invoke <code>mapBlock</code> with <code>ncol = drmX.ncol + 1</code> to let the 
system know that change the number of columns of the matrix. The input to our 
closure is a <code>block</code> of the DRM and an array of <code>keys</code> 
for the rows contained in the block. In order to add a column, we first create 
a new block with an additional column, then copy the data from the current 
block into the new block and finally set the last column to ones and return the 
new block.</p>
+<p>Here is how we add a bias column:</p>
 <div class="codehilite"><pre>
-val drmXwithBiasColumn = drmX.mapBlock(ncol = drmX.ncol + 1) {
-  case(keys, block) =>
-    // create a new block with an additional column
-    val blockWithBiasColumn = block.like(block.nrow, block.ncol + 1)
-    // copy data from current block into the new block
-    blockWithBiasColumn(::, 0 until block.ncol) := block
-    // last column consists of ones
-    blockWithBiasColumn(::, block.ncol) := 1
-
-    keys -> blockWithBiasColumn
-}
+val drmXwithBiasColumn = drmX cbind 1
 </pre></div>
 
-<p>(This looks like a lot of work for something that would be simply 
<code>cbind(drmX, 1)</code> in R. Matrix-scalar 
-<code>cbind</code> combination is still a TODO in Mahout's dialect, although 
<code>cbind</code> exists for other operand type combinations.)</p>
 <p>Now we can give the newly created DRM <code>drmXwithBiasColumn</code> to 
our model fitting method <code>ols</code> and see how well the resulting model 
fits the training data with <code>goodnessOfFit</code>. You should see a large 
improvement in the result.</p>
 <div class="codehilite"><pre>
 val betaWithBiasTerm = ols(drmXwithBiasColumn, y)


Reply via email to