intro-cooccurrence-spark.html

buildbot Sun, 08 Mar 2015 17:20:07 -0700

Author: buildbot
Date: Mon Mar  9 00:19:55 2015
New Revision: 942936

Log:
Staging update by buildbot for mahout


Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Mar  9 00:19:55 2015
@@ -1 +1 @@
-1665098
+1665101

Modified: 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
 Mon Mar  9 00:19:55 2015
@@ -246,14 +246,9 @@
   <div id="content-wrap" class="clearfix">
    <div id="main">
     <h1 id="intro-to-cooccurrence-recommenders-with-spark">Intro to 
Cooccurrence Recommenders with Spark</h1>
-<p>Mahout's next generation recommender is based on the proven cooccurrence 
algorithm but takes it several important steps further
-by creating a multimodal recommender, which can make use of many user actions 
to make recommendations. In the old days 
-only page reads, or purchases could be used alone. Now search terms, 
locations, all manner of clickstream data can be used to 
-recommend - hence the term multimodal. It also allows the recommendations to 
be tuned for the placement context by changine 
-the query without recalculating the model - adding to its multimodality.</p>
 <p>Mahout provides several important building blocks for creating 
recommendations using Spark. <em>spark-itemsimilarity</em> can 
 be used to create "other people also liked these things" type recommendations 
and paired with a search engine can 
-personalize multimodal recommendations for individual users. 
<em>spark-rowsimilarity</em> can provide non-personalized content based 
+personalize recommendations for individual users. <em>spark-rowsimilarity</em> 
can provide non-personalized content based 
 recommendations and when paired with a search engine can be used to 
personalize content based recommendations.</p>
 <p><img alt="image" 
src="http://s6.postimg.org/r0m8bpjw1/recommender_architecture.png"; /></p>
 <p>This is a simplified Lambda architecture with Mahout's 
<em>spark-itemsimilarity</em> playing the batch model building role and a 
search engine playing the realtime serving role.</p>
@@ -262,11 +257,10 @@ recommendations and when paired with a s
 <h2 id="references">References</h2>
 <ol>
 <li>A free ebook, which talks about the general idea: <a 
href="https://www.mapr.com/practical-machine-learning";>Practical Machine 
Learning</a></li>
-<li>A slide deck, which talks about mixing user actions and other indicators: 
<a 
href="http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/";>Multimodal
 Streaming Recommender</a></li>
+<li>A slide deck, which talks about mixing actions or other indicators: <a 
href="http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/";>Creating
 a Unified Recommender</a></li>
 <li>Two blog posts: <a 
href="http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/";>What's
 New in Recommenders: part #1</a>
 and  <a 
href="http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/";>What's
 New in Recommenders: part #2</a></li>
 <li>A post describing the loglikelihood ratio:  <a 
href="http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html";>Surprise
 and Coinsidense</a>  LLR is used to reduce noise in the data while keeping the 
calculations O(n) complexity.</li>
-<li>A demo <a href="https://guide.finderbots.com";>Video Guide</a> site, which 
uses many of the techniques described above.</li>
 </ol>
 <p>Below are the command line jobs but the drivers and associated code can 
also be customized and accessed from the Scala APIs.</p>
 <h2 id="1-spark-itemsimilarity">1. spark-itemsimilarity</h2>
@@ -549,10 +543,10 @@ a blog post,
 the only similarity method supported this is not the optimal way to determine 
general "bag-of-words" document similarity. 
 LLR is used more as a quality filter than as a similarity measure. However 
<em>spark-rowsimilarity</em> will produce 
 lists of similar docs for every doc if input is docs with lists of terms. The 
Apache <a href="http://lucene.apache.org";>Lucene</a> project provides several 
methods of <a 
href="http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html#package_description";>analyzing
 and tokenizing</a> documents.</p>
-<h1 id="wzxhzdk244-creating-a-unified-recommenderwzxhzdk25"><a 
name="unified-recommender">4. Creating a Unified Recommender</a></h1>
-<p>Using the output of <em>spark-itemsimilarity</em> and 
<em>spark-rowsimilarity</em> you can build a unified cooccurrence and content 
based
+<h1 id="wzxhzdk244-creating-a-multimodal-recommenderwzxhzdk25"><a 
name="unified-recommender">4. Creating a Multimodal Recommender</a></h1>
+<p>Using the output of <em>spark-itemsimilarity</em> and 
<em>spark-rowsimilarity</em> you can build a miltimodal cooccurrence and 
content based
  recommender that can be used in both or either mode depending on indicators 
available and the history available at 
-runtime for a user.</p>
+runtime for a user. Some slide describing this method can be found <a 
href="http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/";>here</a></p>
 <h2 id="requirements">Requirements</h2>
 <ol>
 <li>Mahout SNAPSHOT-1.0 or later</li>
@@ -599,6 +593,7 @@ no collaborative filtering data, as happ
 items with the most similar tags. Notice that other users' behavior is not 
considered--only other item's tags. This defines a 
 content or metadata indicator. They are used when you want to find items that 
are similar to other items by using their 
 content or metadata, not by which users interacted with them.</p>
+<p><strong>Note</strong>: It may be advisable to treat tags as 
cross-cooccurrence indicators but for the sake of an example they are treated 
here as content only.</p>
 <p>For this we need input of the form:</p>
 <div class="codehilite"><pre><span class="n">itemID</span><span 
class="o">&lt;</span><span class="n">tab</span><span class="o">&gt;</span><span 
class="n">list</span><span class="o">-</span><span class="n">of</span><span 
class="o">-</span><span class="n">tags</span>
 <span class="p">...</span>
@@ -629,9 +624,8 @@ is finished we no longer need the streng
 
 
 <p>We now have three indicators, two collaborative filtering type and one 
content type.</p>
-<h2 id="unified-recommender-query">Unified Recommender Query</h2>
-<p>The actual form of the query for recommendations will vary depending on 
your search engine but the intent is the same. 
-For a given user, map their history of an action or content to the correct 
indicator field and perform an OR'd query. </p>
+<h2 id="multimodal-recommender-query">Multimodal Recommender Query</h2>
+<p>The actual form of the query for recommendations will vary depending on 
your search engine but the intent is the same. For a given user, map their 
history of an action or content to the correct indicator field and perform an 
OR'd query. </p>
 <p>We have 3 indicators, these are indexed by the search engine into 3 fields, 
we'll call them "purchase", "view", and "tags". 
 We take the user's history that corresponds to each indicator and create a 
query of the form:</p>
 <div class="codehilite"><pre><span class="n">Query</span><span 
class="o">:</span>

svn commit: r942936 - in /websites/staging/mahout/trunk/content: ./ users/recommender/intro-cooccurrence-spark.html

Reply via email to