Author: buildbot
Date: Sun Apr 26 15:49:28 2015
New Revision: 949237

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    
websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Apr 26 15:49:28 2015
@@ -1 +1 @@
-1675715
+1676117

Modified: 
websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html
 Sun Apr 26 15:49:28 2015
@@ -263,24 +263,27 @@
   <div id="content-wrap" class="clearfix">
    <div id="main">
     <h1 id="how-to-create-and-app-using-mahout">How to create and App using 
Mahout</h1>
-<p>This is an example of how to create a simple app using Mahout as a Library. 
The source is available on Github in the <a 
href="https://github.com/pferrel/3-input-cooc";>3-input-cooc project</a> with 
more explanation about what it does. For this tutorial we'll concentrate on how 
to create an app.</p>
-<p>This example is for reading three interactions types and creating 
indicators for them using cooccurrence and cross-cooccurrence. The indicators 
will be written to text files in a format ready for search engine indexing in 
recommender.</p>
+<p>This is an example of how to create a simple app using Mahout as a Library. 
The source is available on Github in the <a 
href="https://github.com/pferrel/3-input-cooc";>3-input-cooc project</a> with 
more explanation about what it does (has to do with collaborative filtering). 
For this tutorial we'll concentrate on the app rather than the data science.</p>
+<p>The app reads in three user-item interactions types and creats indicators 
for them using cooccurrence and cross-cooccurrence. The indicators will be 
written to text files in a format ready for search engine indexing in search 
engine based recommender.</p>
 <h2 id="setup">Setup</h2>
 <p>In order to build and run the CooccurrenceDriver you need to install the 
following:</p>
 <ul>
 <li>Install the Java 7 JDK from Oracle. Mac users look here: <a 
href="http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html";>Java
 SE Development Kit 7u72</a>.</li>
-<li>Install sbt (simple build tool) 0.13.x for <a 
href="http://www.scala-sbt.org/release/tutorial/Installing-sbt-on-Mac.html";>Mac</a>,<a
 
href="http://www.scala-sbt.org/release/tutorial/Installing-sbt-on-Linux.html";>Linux</a>
 or <a 
href="http://www.scala-sbt.org/release/tutorial/Manual-Installation.html";>manual
 instalation</a>.</li>
-<li>Install <a 
href="http://mahout.apache.org/general/downloads.html";>Mahout</a>. Don't forget 
to setup MAHOUT_HOME and MAHOUT_LOCAL</li>
+<li>Install sbt (simple build tool) 0.13.x for <a 
href="http://www.scala-sbt.org/release/tutorial/Installing-sbt-on-Mac.html";>Mac</a>,
 <a 
href="http://www.scala-sbt.org/release/tutorial/Installing-sbt-on-Linux.html";>Linux</a>
 or <a 
href="http://www.scala-sbt.org/release/tutorial/Manual-Installation.html";>manual
 instalation</a>.</li>
+<li>Install <a 
href="https://spark.apache.org/docs/1.1.1/spark-standalone.html";>Spark 
1.1.1</a>. Don't forget to setup SPARK_HOME</li>
+<li>Install <a href="http://mahout.apache.org/general/downloads.html";>Mahout 
0.10.0</a>. Don't forget to setup MAHOUT_HOME and MAHOUT_LOCAL</li>
 </ul>
+<p>Why install if you are only using them as a library? Certain binaries and 
scripts are required by the libraries to get information about the environment 
like discovering where jars are located.</p>
+<p>Spark requires a set of jars on the classpath for the client side part of 
an app and another set of jars must be passed to the Spark Context for running 
distributed code. The example should discover all the neccessary classes 
automatically.</p>
 <h2 id="application">Application</h2>
-<p>Using Mahout as a library in an application will require a little Scala 
code. We have an App trait in Scala so we'll create an object, which inherits 
from <code>App</code></p>
+<p>Using Mahout as a library in an application will require a little Scala 
code. Scala has an App trait so we'll create an object, which inherits from 
<code>App</code></p>
 <div class="codehilite"><pre><span class="n">object</span> <span 
class="n">CooccurrenceDriver</span> <span class="n">extends</span> <span 
class="n">App</span> <span class="p">{</span>
 <span class="p">}</span>
 </pre></div>
 
 
-<p>This will look a little different than Java since <code>App</code> does 
delayed initialization, which causes the main body to be executed when the App 
is launched, just as in Java you would create a CooccurrenceDriver.main.</p>
-<p>Before we can execute something on Spark we'll need to create a context. We 
could use raw Spark calls here but default values are setup for a Mahout 
context.</p>
+<p>This will look a little different than Java since <code>App</code> does 
delayed initialization, which causes the body to be executed when the App is 
launched, just as in Java you would create a main method.</p>
+<p>Before we can execute something on Spark we'll need to create a context. We 
could use raw Spark calls here but default values are setup for a Mahout 
context by using the Mahout helper function.</p>
 <div class="codehilite"><pre><span class="n">implicit</span> <span 
class="n">val</span> <span class="n">mc</span> <span class="p">=</span> <span 
class="n">mahoutSparkContext</span><span class="p">(</span><span 
class="n">masterUrl</span> <span class="p">=</span> &quot;<span 
class="n">local</span>&quot;<span class="p">,</span> 
   <span class="n">appName</span> <span class="p">=</span> &quot;<span 
class="n">CooccurrenceDriver</span>&quot;<span class="p">)</span>
 </pre></div>
@@ -298,8 +301,8 @@
 </pre></div>
 
 
-<p>Mahout has a helper function that reads the text delimited in 
SparkEngine.indexedDatasetDFSReadElements. The function reads single elements 
in a distributed way to create the IndexedDataset. </p>
-<p>Notice we read in all datasets before we adjust the number of rows in them 
to match the total number of users in the data. This is so the math works out 
even if some users took one action but not another.</p>
+<p>Mahout has a helper function that reads the text delimited files  
SparkEngine.indexedDatasetDFSReadElements. The function reads single element 
tuples (user-id,item-id) in a distributed way to create the IndexedDataset. 
Distributed Row Matrices (DRM) and Vectors are important data types supplied by 
Mahout and IndexedDataset is like a very lightweight Dataframe in R, it wraps a 
DRM with HashBiMaps for row and column IDs. </p>
+<p>One important thing to note about this example is that we read in all 
datasets before we adjust the number of rows in them to match the total number 
of users in the data. This is so the math works out <a 
href="http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html";>(A'A,
 A'B, A'C)</a> even if some users took one action but not another there must be 
the same number of rows in all matrices.</p>
 <div class="codehilite"><pre><span class="o">/**</span>
  <span class="o">*</span> Read files of element tuples and create 
IndexedDatasets one per action. These 
  <span class="o">*</span> share a userID BiMap but have their own itemID BiMaps
@@ -338,8 +341,9 @@ def readActions<span class="p">(</span>a
 
 
 <p>Now that we have the data read in we can perform the cooccurrence 
calculation.</p>
-<div class="codehilite"><pre><span class="c1">// strip off names, method takes 
an array of IndexedDatasets</span>
-<span class="n">val</span> <span class="n">indicatorMatrices</span> <span 
class="o">=</span> <span class="n">SimilarityAnalysis</span><span 
class="p">.</span><span class="n">cooccurrencesIDSs</span><span 
class="p">(</span><span class="n">actions</span><span class="p">.</span><span 
class="n">map</span><span class="p">(</span><span class="n">a</span> <span 
class="o">=&gt;</span> <span class="n">a</span><span class="p">.</span><span 
class="n">_2</span><span class="p">))</span>
+<div class="codehilite"><pre><span class="c1">// actions.map creates an array 
of just the IndeedDatasets</span>
+<span class="n">val</span> <span class="n">indicatorMatrices</span> <span 
class="o">=</span> <span class="n">SimilarityAnalysis</span><span 
class="p">.</span><span class="n">cooccurrencesIDSs</span><span 
class="p">(</span>
+  <span class="n">actions</span><span class="p">.</span><span 
class="n">map</span><span class="p">(</span><span class="n">a</span> <span 
class="o">=&gt;</span> <span class="n">a</span><span class="p">.</span><span 
class="n">_2</span><span class="p">))</span>
 </pre></div>
 
 
@@ -418,9 +422,9 @@ def writeIndicators<span class="p">(</sp
 <p>To build and run this example in a debugger like IntelliJ IDEA. Install 
from the IntelliJ site and add the Scala plugin.</p>
 <p>Open IDEA and go to the menu File-&gt;New-&gt;Project from existing 
sources-&gt;SBT-&gt;/path/to/3-input-cooc. This will create an IDEA project 
from <code>build.sbt</code> in the root directory.</p>
 <p>At this point you may create a "Debug Configuration" to run. In the menu 
choose Run-&gt;Edit Configurations. Under "Default" choose "Application". In 
the dialog hit the elipsis button "..." to the right of "Environment Variables" 
and fill in your versions of JAVA_HOME, SPARK_HOME, and MAHOUT_HOME. In 
configuration editor under "Use classpath from" choose root-3-input-cooc 
module. </p>
-<p><img alt="image" src="http://mahout.apache.org/images/debug-config.png"; 
title="=400x" /></p>
+<p><img alt="image" src="http://mahout.apache.org/images/debug-config.png"; 
/></p>
 <p>Now choose "Application" in the left pane and hit the plus sign "+". give 
the config a name and hit the elipsis button to the right of the "Main class" 
field as shown.</p>
-<p><img alt="image" src="http://mahout.apache.org/images/debug-config-2.png"; 
title="=600x" /></p>
+<p><img alt="image" src="http://mahout.apache.org/images/debug-config-2.png"; 
/></p>
 <p>After setting breakpoints you are now ready to debug the configuration. Go 
to the Run-&gt;Debug... menu and pick your configuration. This will execute 
using a local standalone instance of Spark.</p>
 <h2 id="the-mahout-shell">The Mahout Shell</h2>
 <p>For small script-like apps you may wish to use the Mahout shell. It is a 
Scala REPL type interactive shell built on the Spark shell with Mahout-Samsara 
extensions.</p>


Reply via email to