Author: buildbot
Date: Sun Sep 21 15:19:28 2014
New Revision: 923072
Log:
Staging update by buildbot for mahout
Modified:
websites/staging/mahout/trunk/content/ (props changed)
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Sep 21 15:19:28 2014
@@ -1 +1 @@
-1622719
+1626592
Modified:
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
==============================================================================
---
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
(original)
+++
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
Sun Sep 21 15:19:28 2014
@@ -348,8 +348,17 @@ to recommend. </p>
</pre></div>
-<h3 id="more-complex-input">More Complex Input</h3>
-<p>For input of the form:</p>
+<h3 id="how-to-use-multiple-user-actions">How to use Multiple User Actions</h3>
+<p>Often we record various actions the user takes for later analytics. These
can now be used to make recommendations.
+The idea of a recommender is to recommend the action you want the user to
make. For an ecom app this might be
+a purchase action. It is usually not a good idea to just treat other actions
the same as the action you want to recommend.
+For instance a view of an item does not indicate the same intent as a purchase
and if you just mixed the two together you
+might even make worse recommendations. It is tempting though since there are
so many more views than purchases. With <em>spark-itemsimilarity</em>
+we can now use both actions. Mahout will use cross-action cooccurrence
analysis to limit the views to ones that do predict purchases.
+We do this by treating the primary action (purchase) as data for the indicator
matrix and use the secondary action (view)
+to calculate the cross-indicator matrix. </p>
+<p><em>spark-itemsimilarity</em> can read separate actions from separate files
or from a mixed action log by filtering certain lines. For a mixed
+action log of the form:</p>
<div class="codehilite"><pre><span class="n">u1</span><span
class="p">,</span><span class="n">purchase</span><span class="p">,</span><span
class="n">iphone</span>
<span class="n">u1</span><span class="p">,</span><span
class="n">purchase</span><span class="p">,</span><span class="n">ipad</span>
<span class="n">u2</span><span class="p">,</span><span
class="n">purchase</span><span class="p">,</span><span class="n">nexus</span>
@@ -374,7 +383,7 @@ to recommend. </p>
<h3 id="command-line">Command Line</h3>
-<p>Use the following options can be used:</p>
+<p>Use the following options:</p>
<div class="codehilite"><pre><span class="n">bash</span>$ <span
class="n">mahout</span> <span class="n">spark</span><span
class="o">-</span><span class="n">itemsimilarity</span> <span class="o">\</span>
<span class="o">--</span><span class="n">input</span> <span
class="n">in</span><span class="o">-</span><span class="n">file</span> <span
class="o">\</span> # <span class="n">where</span> <span class="n">to</span>
<span class="n">look</span> <span class="k">for</span> <span
class="n">data</span>
<span class="o">--</span><span class="n">output</span> <span
class="n">out</span><span class="o">-</span><span class="n">path</span> <span
class="o">\</span> # <span class="n">root</span> <span class="n">dir</span>
<span class="k">for</span> <span class="n">output</span>
@@ -388,7 +397,8 @@ to recommend. </p>
<h3 id="output">Output</h3>
-<p>The output of the job will be the standard text version of two Mahout DRMs.
This is a case where we are calculating cross-cooccurrence so a primary
indicator matrix and cross-indicator matrix will be created</p>
+<p>The output of the job will be the standard text version of two Mahout DRMs.
This is a case where we are calculating
+cross-cooccurrence so a primary indicator matrix and cross-indicator matrix
will be created</p>
<div class="codehilite"><pre><span class="n">out</span><span
class="o">-</span><span class="n">path</span>
<span class="o">|--</span> <span class="n">indicator</span><span
class="o">-</span><span class="n">matrix</span> <span class="o">-</span> <span
class="n">TDF</span> <span class="n">part</span> <span class="n">files</span>
<span class="o">\--</span> <span class="nb">cross</span><span
class="o">-</span><span class="n">indicator</span><span class="o">-</span><span
class="n">matrix</span> <span class="o">-</span> <span class="n">TDF</span>
<span class="n">part</span><span class="o">-</span><span class="n">files</span>
@@ -413,6 +423,8 @@ to recommend. </p>
</pre></div>
+<p><strong>Note:</strong> You can run this multiple times to use more than two
actions or you can use the underlying
+SimilarityAnalysis.cooccurrence API, which will more efficiently calculate any
number of cross-indicators.</p>
<h3 id="log-file-input">Log File Input</h3>
<p>A common method of storing data is in log files. If they are written using
some delimiter they can be consumed directly by spark-itemsimilarity. For
instance input of the form:</p>
<div class="codehilite"><pre>2014<span class="o">-</span>06<span
class="o">-</span>23 14<span class="p">:</span>46<span
class="p">:</span>53<span class="p">.</span>115<span class="o">\</span><span
class="n">tu1</span><span class="o">\</span><span
class="n">tpurchase</span><span class="o">\</span><span
class="n">trandom</span> <span class="n">text</span><span
class="o">\</span><span class="n">tiphone</span>