Author: buildbot
Date: Mon Mar 9 00:19:55 2015
New Revision: 942936
Log:
Staging update by buildbot for mahout
Modified:
websites/staging/mahout/trunk/content/ (props changed)
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Mar 9 00:19:55 2015
@@ -1 +1 @@
-1665098
+1665101
Modified:
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
==============================================================================
---
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
(original)
+++
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
Mon Mar 9 00:19:55 2015
@@ -246,14 +246,9 @@
<div id="content-wrap" class="clearfix">
<div id="main">
<h1 id="intro-to-cooccurrence-recommenders-with-spark">Intro to
Cooccurrence Recommenders with Spark</h1>
-<p>Mahout's next generation recommender is based on the proven cooccurrence
algorithm but takes it several important steps further
-by creating a multimodal recommender, which can make use of many user actions
to make recommendations. In the old days
-only page reads, or purchases could be used alone. Now search terms,
locations, all manner of clickstream data can be used to
-recommend - hence the term multimodal. It also allows the recommendations to
be tuned for the placement context by changine
-the query without recalculating the model - adding to its multimodality.</p>
<p>Mahout provides several important building blocks for creating
recommendations using Spark. <em>spark-itemsimilarity</em> can
be used to create "other people also liked these things" type recommendations
and paired with a search engine can
-personalize multimodal recommendations for individual users.
<em>spark-rowsimilarity</em> can provide non-personalized content based
+personalize recommendations for individual users. <em>spark-rowsimilarity</em>
can provide non-personalized content based
recommendations and when paired with a search engine can be used to
personalize content based recommendations.</p>
<p><img alt="image"
src="http://s6.postimg.org/r0m8bpjw1/recommender_architecture.png" /></p>
<p>This is a simplified Lambda architecture with Mahout's
<em>spark-itemsimilarity</em> playing the batch model building role and a
search engine playing the realtime serving role.</p>
@@ -262,11 +257,10 @@ recommendations and when paired with a s
<h2 id="references">References</h2>
<ol>
<li>A free ebook, which talks about the general idea: <a
href="https://www.mapr.com/practical-machine-learning">Practical Machine
Learning</a></li>
-<li>A slide deck, which talks about mixing user actions and other indicators:
<a
href="http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/">Multimodal
Streaming Recommender</a></li>
+<li>A slide deck, which talks about mixing actions or other indicators: <a
href="http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/">Creating
a Unified Recommender</a></li>
<li>Two blog posts: <a
href="http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/">What's
New in Recommenders: part #1</a>
and <a
href="http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/">What's
New in Recommenders: part #2</a></li>
<li>A post describing the loglikelihood ratio: <a
href="http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html">Surprise
and Coinsidense</a> LLR is used to reduce noise in the data while keeping the
calculations O(n) complexity.</li>
-<li>A demo <a href="https://guide.finderbots.com">Video Guide</a> site, which
uses many of the techniques described above.</li>
</ol>
<p>Below are the command line jobs but the drivers and associated code can
also be customized and accessed from the Scala APIs.</p>
<h2 id="1-spark-itemsimilarity">1. spark-itemsimilarity</h2>
@@ -549,10 +543,10 @@ a blog post,
the only similarity method supported this is not the optimal way to determine
general "bag-of-words" document similarity.
LLR is used more as a quality filter than as a similarity measure. However
<em>spark-rowsimilarity</em> will produce
lists of similar docs for every doc if input is docs with lists of terms. The
Apache <a href="http://lucene.apache.org">Lucene</a> project provides several
methods of <a
href="http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html#package_description">analyzing
and tokenizing</a> documents.</p>
-<h1 id="wzxhzdk244-creating-a-unified-recommenderwzxhzdk25"><a
name="unified-recommender">4. Creating a Unified Recommender</a></h1>
-<p>Using the output of <em>spark-itemsimilarity</em> and
<em>spark-rowsimilarity</em> you can build a unified cooccurrence and content
based
+<h1 id="wzxhzdk244-creating-a-multimodal-recommenderwzxhzdk25"><a
name="unified-recommender">4. Creating a Multimodal Recommender</a></h1>
+<p>Using the output of <em>spark-itemsimilarity</em> and
<em>spark-rowsimilarity</em> you can build a miltimodal cooccurrence and
content based
recommender that can be used in both or either mode depending on indicators
available and the history available at
-runtime for a user.</p>
+runtime for a user. Some slide describing this method can be found <a
href="http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/">here</a></p>
<h2 id="requirements">Requirements</h2>
<ol>
<li>Mahout SNAPSHOT-1.0 or later</li>
@@ -599,6 +593,7 @@ no collaborative filtering data, as happ
items with the most similar tags. Notice that other users' behavior is not
considered--only other item's tags. This defines a
content or metadata indicator. They are used when you want to find items that
are similar to other items by using their
content or metadata, not by which users interacted with them.</p>
+<p><strong>Note</strong>: It may be advisable to treat tags as
cross-cooccurrence indicators but for the sake of an example they are treated
here as content only.</p>
<p>For this we need input of the form:</p>
<div class="codehilite"><pre><span class="n">itemID</span><span
class="o"><</span><span class="n">tab</span><span class="o">></span><span
class="n">list</span><span class="o">-</span><span class="n">of</span><span
class="o">-</span><span class="n">tags</span>
<span class="p">...</span>
@@ -629,9 +624,8 @@ is finished we no longer need the streng
<p>We now have three indicators, two collaborative filtering type and one
content type.</p>
-<h2 id="unified-recommender-query">Unified Recommender Query</h2>
-<p>The actual form of the query for recommendations will vary depending on
your search engine but the intent is the same.
-For a given user, map their history of an action or content to the correct
indicator field and perform an OR'd query. </p>
+<h2 id="multimodal-recommender-query">Multimodal Recommender Query</h2>
+<p>The actual form of the query for recommendations will vary depending on
your search engine but the intent is the same. For a given user, map their
history of an action or content to the correct indicator field and perform an
OR'd query. </p>
<p>We have 3 indicators, these are indexed by the search engine into 3 fields,
we'll call them "purchase", "view", and "tags".
We take the user's history that corresponds to each indicator and create a
query of the form:</p>
<div class="codehilite"><pre><span class="n">Query</span><span
class="o">:</span>