Author: buildbot
Date: Thu Oct 2 21:23:43 2014
New Revision: 924454
Log:
Staging update by buildbot for mahout
Modified:
websites/staging/mahout/trunk/content/ (props changed)
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Oct 2 21:23:43 2014
@@ -1 +1 @@
-1629066
+1629072
Modified:
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
==============================================================================
---
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
(original)
+++
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
Thu Oct 2 21:23:43 2014
@@ -552,26 +552,35 @@ runtime for a user.</p>
<p>The query for recommendations will be a mix of values meant to match one of
your indicators. The query can be constructed
from user history and values derived from context (category being viewed for
instance) or special precalculated data
(popularity rank for instance). This blending of indicators allows for
creating many flavors or recommendations to fit
-a very wide variety of circumstances. It allows recommendations to be made for
items with no usage data and even allows
-for gracefully degrading recommendations based on how much user history is
available. </p>
+a very wide variety of circumstances.</p>
<p>With the right mix of indicators developers can construct a single query
that works for completely new items and new users
-while working well for items with lots of interactions and users with many
recorded actions. In other words adding in content and intrinsic
-indicators allows developers to create a solution for the "cold-start" problem
that gracefully improves with more user history
+while working well for items with lots of interactions and users with many
recorded actions. In other words by adding in content and intrinsic
+indicators developers can create a solution for the "cold-start" problem that
gracefully improves with more user history
and as items have more interactions. It is also possible to create a
completely content-based recommender that personalizes
recommendations.</p>
<h2 id="example-with-3-indicators">Example with 3 Indicators</h2>
-<p>You will need to decide how you store user action data so they can be
processed by the item and row similarity jobs and this is most easily done by
using text files as described above. The data that is processed by these jobs
is considered the <strong>training data</strong>. You will need some amount of
user history in your recs query. It is typical to use the most recent user
history but need not be exactly what is in the training set, which may include
more historical data. Keeping the user history for query purposes could be done
with a database by referencing some history from a users table. In the example
above the two collaborative filtering actions are "purchase" and "view", but
let's also add tags (taken from catalog categories or other descriptive
metadata). </p>
-<p>We will need to create 1 indicator from the primary action (purchase) 1
cross-indicator from the secondary action (view) and 1 content-indicator for
(tags). We'll have to run <em>spark-itemsimilarity</em> once and
<em>spark-rowsimilarity</em> once.</p>
-<p>We have described how to create the indicator and cross-indicator for
purchase and view (the <a href="#multiple-actions">How to use Multiple User
+<p>You will need to decide how you store user action data so they can be
processed by the item and row similarity jobs and
+this is most easily done by using text files as described above. The data that
is processed by these jobs is considered the
+training data. You will need some amount of user history in your recs query.
It is typical to use the most recent user history
+but need not be exactly what is in the training set, which may include a
greater volume of historical data. Keeping the user
+history for query purposes could be done with a database by storing it in a
users table. In the example above the two
+collaborative filtering actions are "purchase" and "view", but let's also add
tags (taken from catalog categories or other
+descriptive metadata). </p>
+<p>We will need to create 1 cooccurrence indicator from the primary action
(purchase) 1 cross-action cooccurrence indicator
+from the secondary action (view)
+and 1 content indicator (tags). We'll have to run
<em>spark-itemsimilarity</em> once and <em>spark-rowsimilarity</em> once.</p>
+<p>We have described how to create the collaborative filtering indicator and
cross-indicator for purchase and view (the <a href="#multiple-actions">How to
use Multiple User
Actions</a> section) but tags will be a slightly different process. We want to
use the fact that
certain items have tags similar to the ones associated with a user's
purchases. This is not a collaborative filtering indicator
-but rather a "content" or "metadata" type indicator since you are not using
other users' tag viewing history, only the
+but rather a "content" or "metadata" type indicator since you are not using
other users' history, only the
individual that you are making recs for. This means that this method will make
recommendations for items that have
no collaborative filtering data, as happens with new items in a catalog. New
items may have tags assigned but no one
- has purchased or viewed them yet. </p>
-<p>We could have treated viewing tags as a collaborative filtering
cross-indicator by recording other users tag viewing history and that would
probably give better results but here we are trying to illustrate recommending
without CF data and using content-indicators. In the final query we will mix
all 3 indicators.</p>
+ has purchased or viewed them yet. In the final query we will mix all 3
indicators.</p>
<h2 id="content-indicator">Content Indicator</h2>
-<p>To create a content-indicator we'll make use of the fact that the user has
purchased items with certain tags. We want to find items with the most similar
tags. Notice that other users' behavior is not considered--only other item's
tags. This defines a content or metadata indicator. They are used when you want
to find items that are similar to other items by using their content or
metadata, not by which users interacted with them.</p>
+<p>To create a content-indicator we'll make use of the fact that the user has
purchased items with certain tags. We want to find
+items with the most similar tags. Notice that other users' behavior is not
considered--only other item's tags. This defines a
+content or metadata indicator. They are used when you want to find items that
are similar to other items by using their
+content or metadata, not by which users interacted with them.</p>
<p>For this we need input of the form:</p>
<div class="codehilite"><pre><span class="n">itemID</span><span
class="o"><</span><span class="n">tab</span><span class="o">></span><span
class="n">list</span><span class="o">-</span><span class="n">of</span><span
class="o">-</span><span class="n">tags</span>
<span class="p">...</span>
@@ -585,7 +594,10 @@ no collaborative filtering data, as happ
</pre></div>
-<p>We'll use <em>spark-rowimilairity</em> because we are looking for similar
rows, which encode items in this case. As with the indicator and
cross-indicator we use the --omitStrength option. The strengths created are
probabilistic log-likelihood ratios and so are used to filter unimportant
similarities. Once the filtering or downsampling are finished we no longer need
the strengths. We will get an indicator matrix of the form:</p>
+<p>We'll use <em>spark-rowimilairity</em> because we are looking for similar
rows, which encode items in this case. As with the
+collaborative filtering indicator and cross-indicator we use the
--omitStrength option. The strengths created are
+probabilistic log-likelihood ratios and so are used to filter unimportant
similarities. Once the filtering or downsampling
+is finished we no longer need the strengths. We will get an indicator matrix
of the form:</p>
<div class="codehilite"><pre><span class="n">itemID</span><span
class="o"><</span><span class="n">tab</span><span class="o">></span><span
class="n">list</span><span class="o">-</span><span class="n">of</span><span
class="o">-</span><span class="n">item</span> <span class="n">IDs</span>
<span class="p">...</span>
</pre></div>
@@ -598,13 +610,12 @@ no collaborative filtering data, as happ
</pre></div>
-<p>We now have three indicators, two collaborative filtering type and one
content type. Notice that purchase, view, and tags can all be recorded for
users and so can be used in a recommendations query.</p>
+<p>We now have three indicators, two collaborative filtering type and one
content type.</p>
<h2 id="unified-recommender-query">Unified Recommender Query</h2>
<p>The actual form of the query for recommendations will vary depending on
your search engine but the intent is the same.
-For a given user, map their history of an action or content to the correct
indicator field and perform an OR'd query.
-This will allow matches from any indicator where AND queries require that an
item have some similarity to all indicator
-fields.</p>
-<p>We have 3 indicators, these are indexed by the search engine into 3 fields,
we'll call them "purchase", "view", and "tags". We take the user's history that
corresponds to each indicator and create a query of the form:</p>
+For a given user, map their history of an action or content to the correct
indicator field and perform an OR'd query. </p>
+<p>We have 3 indicators, these are indexed by the search engine into 3 fields,
we'll call them "purchase", "view", and "tags".
+We take the user's history that corresponds to each indicator and create a
query of the form:</p>
<div class="codehilite"><pre><span class="n">Query</span><span
class="o">:</span>
<span class="n">field</span><span class="o">:</span> <span
class="n">purchase</span><span class="o">;</span> <span class="n">q</span><span
class="o">:</span><span class="n">user</span><span
class="s1">'s-purchase-history</span>
<span class="s1"> field: view; q:user'</span><span class="n">s</span>
<span class="n">view</span><span class="o">-</span><span
class="n">history</span>
@@ -612,7 +623,8 @@ fields.</p>
</pre></div>
-<p>The query will result in an ordered list of items recommended for purchase
but skewed towards items with similar tags to the ones the user has already
purchased. </p>
+<p>The query will result in an ordered list of items recommended for purchase
but skewed towards items with similar tags to
+the ones the user has already purchased. </p>
<p>This is only an example and not necessarily the optimal way to create recs.
It illustrates how business decisions can be
translated into recommendations. This technique can be used to skew
recommendations towards intrinsic indicators also.
For instance you may want to put personalized popular item recs in a special
place in the UI. Create a popularity indicator