Author: buildbot
Date: Tue Apr 21 18:51:00 2015
New Revision: 948658
Log:
Staging update by buildbot for mahout
Modified:
websites/staging/mahout/trunk/content/ (props changed)
websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html
Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue Apr 21 18:51:00 2015
@@ -1 +1 @@
-1675180
+1675182
Modified:
websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html
==============================================================================
---
websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html
(original)
+++
websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html
Tue Apr 21 18:51:00 2015
@@ -301,8 +301,8 @@ hout context.</p>
<p>Mahout has a helper function that reads the text delimited in
SparkEngine.indexedDatasetDFSReadElements. The function reads single elements
in a distributed way to create the IndexedDataset. </p>
<p>Notice we read in all datasets before we adjust the number of rows in them
to match the total number of users in the data. This is so the math works out
even if some users took one action but not another.</p>
<div class="codehilite"><pre><span class="o">/**</span>
- <span class="o">*</span> Read files of element tuples and create
IndexedDatasets one per action. These share a
- <span class="o">*</span> userID BiMap but have their own itemID BiMaps
+ <span class="o">*</span> Read files of element tuples and create
IndexedDatasets one per action. These
+ <span class="o">*</span> share a userID BiMap but have their own itemID BiMaps
<span class="o">*/</span>
def readActions<span class="p">(</span>actionInput: Array<span
class="p">[(</span>String<span class="p">,</span> String<span
class="p">)])</span>: Array<span class="p">[(</span>String<span
class="p">,</span> IndexedDataset<span class="p">)]</span> <span
class="o">=</span> <span class="p">{</span>
var actions <span class="o">=</span> Array<span
class="p">[(</span>String<span class="p">,</span> IndexedDataset<span
class="p">)]()</span>
@@ -329,8 +329,7 @@ def readActions<span class="p">(</span>a
val resizedNameActionPairs <span class="o">=</span> actions.map <span
class="p">{</span> a <span class="o">=></span>
<span class="o">//</span>resize the matrix by<span class="p">,</span> in
effect by adding empty rows
- val resizedMatrix <span class="o">=</span>
- a._2.create<span class="p">(</span>a._2.matrix<span class="p">,</span>
userDictionary<span class="p">,</span> a._2.columnIDs<span
class="p">)</span><span class="m">.</span>newRowCardinality<span
class="p">(</span>numUsers<span class="p">)</span>
+ val resizedMatrix <span class="o">=</span> a._2.create<span
class="p">(</span>a._2.matrix<span class="p">,</span> userDictionary<span
class="p">,</span> a._2.columnIDs<span class="p">)</span><span
class="m">.</span>newRowCardinality<span class="p">(</span>numUsers<span
class="p">)</span>
<span class="p">(</span>a._1<span class="p">,</span> resizedMatrix<span
class="p">)</span> <span class="o">//</span> return the Tuple of <span
class="p">(</span>name<span class="p">,</span> IndexedDataset<span
class="p">)</span>
<span class="p">}</span>
resizedNameActionPairs <span class="o">//</span> return the array of Tuples
@@ -339,7 +338,7 @@ def readActions<span class="p">(</span>a
<p>Now that we have the data read in we can perform the cooccurrence
calculation.</p>
-<div class="codehilite"><pre><span class="c1">// strip off names, which only
takes and array of IndexedDatasets</span>
+<div class="codehilite"><pre><span class="c1">// strip off names, method takes
an array of IndexedDatasets</span>
<span class="n">val</span> <span class="n">indicatorMatrices</span> <span
class="o">=</span> <span class="n">SimilarityAnalysis</span><span
class="p">.</span><span class="n">cooccurrencesIDSs</span><span
class="p">(</span><span class="n">actions</span><span class="p">.</span><span
class="n">map</span><span class="p">(</span><span class="n">a</span> <span
class="o">=></span> <span class="n">a</span><span class="p">.</span><span
class="n">_2</span><span class="p">))</span>
</pre></div>
@@ -354,13 +353,16 @@ def readActions<span class="p">(</span>a
<p>The <code>writeIndicators</code> method uses the default write function
<code>dfsWrite</code>.</p>
<div class="codehilite"><pre><span class="o">/**</span>
<span class="o">*</span> Write indicatorMatrices to the output dir in the
default format
+ <span class="o">*</span> for indexing by a search engine.
<span class="o">*/</span>
def writeIndicators<span class="p">(</span> indicators: Array<span
class="p">[(</span>String<span class="p">,</span> IndexedDataset<span
class="p">)])</span> <span class="o">=</span> <span class="p">{</span>
<span class="kr">for</span> <span class="p">(</span>indicator <span
class="o"><-</span> indicators <span class="p">)</span> <span
class="p">{</span>
+ <span class="o">//</span> create a name based on the type of indicator
val indicatorDir <span class="o">=</span> OutputPath <span
class="o">+</span> indicator._1
indicator._2.dfsWrite<span class="p">(</span>
- indicatorDir<span class="p">,</span> <span class="o">//</span> do we
have to remove the last <span class="p">$</span> char?
- <span class="o">//</span> omit LLR strengths and format for search
engine indexing
+ indicatorDir<span class="p">,</span>
+ <span class="o">//</span> Schema tells the writer to omit LLR strengths
+ <span class="o">//</span> and format for search engine indexing
IndexedDatasetWriteBooleanSchema<span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
@@ -395,7 +397,8 @@ def writeIndicators<span class="p">(</sp
<span class="n">packSettings</span>
-<span class="n">packMain</span> <span class="p">:=</span> <span
class="n">Map</span><span class="p">(</span>"<span
class="n">cooc</span>" <span class="o">-></span> "<span
class="n">CooccurrenceDriver</span>"<span class="p">)</span>
+<span class="n">packMain</span> <span class="p">:=</span> <span
class="n">Map</span><span class="p">(</span>
+ "<span class="n">cooc</span>" <span class="o">-></span>
"<span class="n">CooccurrenceDriver</span>"<span class="p">)</span>
</pre></div>