intro-cooccurrence-spark.html

buildbot Thu, 02 Oct 2014 14:24:59 -0700

Author: buildbot
Date: Thu Oct  2 21:23:43 2014
New Revision: 924454

Log:
Staging update by buildbot for mahout


Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Oct  2 21:23:43 2014
@@ -1 +1 @@
-1629066
+1629072

Modified: 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
 Thu Oct  2 21:23:43 2014
@@ -552,26 +552,35 @@ runtime for a user.</p>
 <p>The query for recommendations will be a mix of values meant to match one of 
your indicators. The query can be constructed 
 from user history and values derived from context (category being viewed for 
instance) or special precalculated data 
 (popularity rank for instance). This blending of indicators allows for 
creating many flavors or recommendations to fit 
-a very wide variety of circumstances. It allows recommendations to be made for 
items with no usage data and even allows 
-for gracefully degrading recommendations based on how much user history is 
available. </p>
+a very wide variety of circumstances.</p>
 <p>With the right mix of indicators developers can construct a single query 
that works for completely new items and new users 
-while working well for items with lots of interactions and users with many 
recorded actions. In other words adding in content and intrinsic 
-indicators allows developers to create a solution for the "cold-start" problem 
that gracefully improves with more user history
+while working well for items with lots of interactions and users with many 
recorded actions. In other words by adding in content and intrinsic 
+indicators developers can create a solution for the "cold-start" problem that 
gracefully improves with more user history
 and as items have more interactions. It is also possible to create a 
completely content-based recommender that personalizes 
 recommendations.</p>
 <h2 id="example-with-3-indicators">Example with 3 Indicators</h2>
-<p>You will need to decide how you store user action data so they can be 
processed by the item and row similarity jobs and this is most easily done by 
using text files as described above. The data that is processed by these jobs 
is considered the <strong>training data</strong>. You will need some amount of 
user history in your recs query. It is typical to use the most recent user 
history but need not be exactly what is in the training set, which may include 
more historical data. Keeping the user history for query purposes could be done 
with a database by referencing some history from a users table. In the example 
above the two collaborative filtering actions are "purchase" and "view", but 
let's also add tags (taken from catalog categories or other descriptive 
metadata). </p>
-<p>We will need to create 1 indicator from the primary action (purchase) 1 
cross-indicator from the secondary action (view) and 1 content-indicator for 
(tags). We'll have to run <em>spark-itemsimilarity</em> once and 
<em>spark-rowsimilarity</em> once.</p>
-<p>We have described how to create the indicator and cross-indicator for 
purchase and view (the <a href="#multiple-actions">How to use Multiple User 
+<p>You will need to decide how you store user action data so they can be 
processed by the item and row similarity jobs and 
+this is most easily done by using text files as described above. The data that 
is processed by these jobs is considered the 
+training data. You will need some amount of user history in your recs query. 
It is typical to use the most recent user history 
+but need not be exactly what is in the training set, which may include a 
greater volume of historical data. Keeping the user 
+history for query purposes could be done with a database by storing it in a 
users table. In the example above the two 
+collaborative filtering actions are "purchase" and "view", but let's also add 
tags (taken from catalog categories or other 
+descriptive metadata). </p>
+<p>We will need to create 1 cooccurrence indicator from the primary action 
(purchase) 1 cross-action cooccurrence indicator 
+from the secondary action (view) 
+and 1 content indicator (tags). We'll have to run 
<em>spark-itemsimilarity</em> once and <em>spark-rowsimilarity</em> once.</p>
+<p>We have described how to create the collaborative filtering indicator and 
cross-indicator for purchase and view (the <a href="#multiple-actions">How to 
use Multiple User 
 Actions</a> section) but tags will be a slightly different process. We want to 
use the fact that 
 certain items have tags similar to the ones associated with a user's 
purchases. This is not a collaborative filtering indicator 
-but rather a "content" or "metadata" type indicator since you are not using 
other users' tag viewing history, only the 
+but rather a "content" or "metadata" type indicator since you are not using 
other users' history, only the 
 individual that you are making recs for. This means that this method will make 
recommendations for items that have 
 no collaborative filtering data, as happens with new items in a catalog. New 
items may have tags assigned but no one
- has purchased or viewed them yet. </p>
-<p>We could have treated viewing tags as a collaborative filtering 
cross-indicator by recording other users tag viewing history and that would 
probably give better results but here we are trying to illustrate recommending 
without CF data and using content-indicators. In the final query we will mix 
all 3 indicators.</p>
+ has purchased or viewed them yet. In the final query we will mix all 3 
indicators.</p>
 <h2 id="content-indicator">Content Indicator</h2>
-<p>To create a content-indicator we'll make use of the fact that the user has 
purchased items with certain tags. We want to find items with the most similar 
tags. Notice that other users' behavior is not considered--only other item's 
tags. This defines a content or metadata indicator. They are used when you want 
to find items that are similar to other items by using their content or 
metadata, not by which users interacted with them.</p>
+<p>To create a content-indicator we'll make use of the fact that the user has 
purchased items with certain tags. We want to find 
+items with the most similar tags. Notice that other users' behavior is not 
considered--only other item's tags. This defines a 
+content or metadata indicator. They are used when you want to find items that 
are similar to other items by using their 
+content or metadata, not by which users interacted with them.</p>
 <p>For this we need input of the form:</p>
 <div class="codehilite"><pre><span class="n">itemID</span><span 
class="o">&lt;</span><span class="n">tab</span><span class="o">&gt;</span><span 
class="n">list</span><span class="o">-</span><span class="n">of</span><span 
class="o">-</span><span class="n">tags</span>
 <span class="p">...</span>
@@ -585,7 +594,10 @@ no collaborative filtering data, as happ
 </pre></div>
 
 
-<p>We'll use <em>spark-rowimilairity</em> because we are looking for similar 
rows, which encode items in this case. As with the indicator and 
cross-indicator we use the --omitStrength option. The strengths created are 
probabilistic log-likelihood ratios and so are used to filter unimportant 
similarities. Once the filtering or downsampling are finished we no longer need 
the strengths. We will get an indicator matrix of the form:</p>
+<p>We'll use <em>spark-rowimilairity</em> because we are looking for similar 
rows, which encode items in this case. As with the 
+collaborative filtering indicator and cross-indicator we use the 
--omitStrength option. The strengths created are 
+probabilistic log-likelihood ratios and so are used to filter unimportant 
similarities. Once the filtering or downsampling 
+is finished we no longer need the strengths. We will get an indicator matrix 
of the form:</p>
 <div class="codehilite"><pre><span class="n">itemID</span><span 
class="o">&lt;</span><span class="n">tab</span><span class="o">&gt;</span><span 
class="n">list</span><span class="o">-</span><span class="n">of</span><span 
class="o">-</span><span class="n">item</span> <span class="n">IDs</span>
 <span class="p">...</span>
 </pre></div>
@@ -598,13 +610,12 @@ no collaborative filtering data, as happ
 </pre></div>
 
 
-<p>We now have three indicators, two collaborative filtering type and one 
content type. Notice that purchase, view, and tags can all be recorded for 
users and so can be used in a recommendations query.</p>
+<p>We now have three indicators, two collaborative filtering type and one 
content type.</p>
 <h2 id="unified-recommender-query">Unified Recommender Query</h2>
 <p>The actual form of the query for recommendations will vary depending on 
your search engine but the intent is the same. 
-For a given user, map their history of an action or content to the correct 
indicator field and perform an OR'd query. 
-This will allow matches from any indicator where AND queries require that an 
item have some similarity to all indicator 
-fields.</p>
-<p>We have 3 indicators, these are indexed by the search engine into 3 fields, 
we'll call them "purchase", "view", and "tags". We take the user's history that 
corresponds to each indicator and create a query of the form:</p>
+For a given user, map their history of an action or content to the correct 
indicator field and perform an OR'd query. </p>
+<p>We have 3 indicators, these are indexed by the search engine into 3 fields, 
we'll call them "purchase", "view", and "tags". 
+We take the user's history that corresponds to each indicator and create a 
query of the form:</p>
 <div class="codehilite"><pre><span class="n">Query</span><span 
class="o">:</span>
   <span class="n">field</span><span class="o">:</span> <span 
class="n">purchase</span><span class="o">;</span> <span class="n">q</span><span 
class="o">:</span><span class="n">user</span><span 
class="s1">&#39;s-purchase-history</span>
 <span class="s1">  field: view; q:user&#39;</span><span class="n">s</span> 
<span class="n">view</span><span class="o">-</span><span 
class="n">history</span>
@@ -612,7 +623,8 @@ fields.</p>
 </pre></div>
 
 
-<p>The query will result in an ordered list of items recommended for purchase 
but skewed towards items with similar tags to the ones the user has already 
purchased. </p>
+<p>The query will result in an ordered list of items recommended for purchase 
but skewed towards items with similar tags to 
+the ones the user has already purchased. </p>
 <p>This is only an example and not necessarily the optimal way to create recs. 
It illustrates how business decisions can be 
 translated into recommendations. This technique can be used to skew 
recommendations towards intrinsic indicators also. 
 For instance you may want to put personalized popular item recs in a special 
place in the UI. Create a popularity indicator

svn commit: r924454 - in /websites/staging/mahout/trunk/content: ./ users/recommender/intro-cooccurrence-spark.html

Reply via email to