intro-cooccurrence-spark.mdtext

pat Sun, 08 Mar 2015 17:20:03 -0700

Author: pat
Date: Mon Mar  9 00:19:50 2015
New Revision: 1665101

URL: http://svn.apache.org/r1665101
Log:
fixed some wording


Modified:
    
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext

Modified: 
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
URL: 
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext?rev=1665101&r1=1665100&r2=1665101&view=diff
==============================================================================
--- 
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
 (original)
+++ 
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
 Mon Mar  9 00:19:50 2015
@@ -1,14 +1,8 @@
 #Intro to Cooccurrence Recommenders with Spark
 
-Mahout's next generation recommender is based on the proven cooccurrence 
algorithm but takes it several important steps further
-by creating a multimodal recommender, which can make use of many user actions 
to make recommendations. In the old days 
-only page reads, or purchases could be used alone. Now search terms, 
locations, all manner of clickstream data can be used to 
-recommend - hence the term multimodal. It also allows the recommendations to 
be tuned for the placement context by changine 
-the query without recalculating the model - adding to its multimodality.
-
 Mahout provides several important building blocks for creating recommendations 
using Spark. *spark-itemsimilarity* can 
 be used to create "other people also liked these things" type recommendations 
and paired with a search engine can 
-personalize multimodal recommendations for individual users. 
*spark-rowsimilarity* can provide non-personalized content based 
+personalize recommendations for individual users. *spark-rowsimilarity* can 
provide non-personalized content based 
 recommendations and when paired with a search engine can be used to 
personalize content based recommendations.
 
 ![image](http://s6.postimg.org/r0m8bpjw1/recommender_architecture.png)
@@ -22,11 +16,10 @@ User history is used as a query on the i
 ##References
 
 1. A free ebook, which talks about the general idea: [Practical Machine 
Learning](https://www.mapr.com/practical-machine-learning)
-2. A slide deck, which talks about mixing user actions and other indicators: 
[Multimodal Streaming 
Recommender](http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/)
+2. A slide deck, which talks about mixing actions or other indicators: 
[Creating a Unified 
Recommender](http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/)
 3. Two blog posts: [What's New in Recommenders: part 
#1](http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/)
 and  [What's New in Recommenders: part 
#2](http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/)
-4. A post describing the loglikelihood ratio:  [Surprise and 
Coinsidense](http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html)
  LLR is used to reduce noise in the data while keeping the calculations O(n) 
complexity.
-5. A demo [Video Guide][1] site, which uses many of the techniques described 
above.
+3. A post describing the loglikelihood ratio:  [Surprise and 
Coinsidense](http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html)
  LLR is used to reduce noise in the data while keeping the calculations O(n) 
complexity.
 
 Below are the command line jobs but the drivers and associated code can also 
be customized and accessed from the Scala APIs.
 
@@ -320,11 +313,11 @@ the only similarity method supported thi
 LLR is used more as a quality filter than as a similarity measure. However 
*spark-rowsimilarity* will produce 
 lists of similar docs for every doc if input is docs with lists of terms. The 
Apache [Lucene](http://lucene.apache.org) project provides several methods of 
[analyzing and 
tokenizing](http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html#package_description)
 documents.
 
-#<a name="unified-recommender">4. Creating a Unified Recommender</a>
+#<a name="unified-recommender">4. Creating a Multimodal Recommender</a>
 
-Using the output of *spark-itemsimilarity* and *spark-rowsimilarity* you can 
build a unified cooccurrence and content based
+Using the output of *spark-itemsimilarity* and *spark-rowsimilarity* you can 
build a miltimodal cooccurrence and content based
  recommender that can be used in both or either mode depending on indicators 
available and the history available at 
-runtime for a user.
+runtime for a user. Some slide describing this method can be found 
[here](http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/)
 
 ##Requirements
 
@@ -381,6 +374,8 @@ items with the most similar tags. Notice
 content or metadata indicator. They are used when you want to find items that 
are similar to other items by using their 
 content or metadata, not by which users interacted with them.
 
+**Note**: It may be advisable to treat tags as cross-cooccurrence indicators 
but for the sake of an example they are treated here as content only.
+
 For this we need input of the form:
 
     itemID<tab>list-of-tags
@@ -408,10 +403,9 @@ This is a content indicator since it has
     
 We now have three indicators, two collaborative filtering type and one content 
type.
 
-##Unified Recommender Query
+##Multimodal Recommender Query
 
-The actual form of the query for recommendations will vary depending on your 
search engine but the intent is the same. 
-For a given user, map their history of an action or content to the correct 
indicator field and perform an OR'd query. 
+The actual form of the query for recommendations will vary depending on your 
search engine but the intent is the same. For a given user, map their history 
of an action or content to the correct indicator field and perform an OR'd 
query. 
 
 We have 3 indicators, these are indexed by the search engine into 3 fields, 
we'll call them "purchase", "view", and "tags". 
 We take the user's history that corresponds to each indicator and create a 
query of the form:
@@ -443,6 +437,3 @@ This will return recommendations favorin
 2. Content can be used where there is no recorded user behavior or when items 
change too quickly to get much interaction history. They can be used alone or 
mixed with other indicators.
 3. Most search engines support "boost" factors so you can favor one or more 
indicators. In the example query, if you want tags to only have a small effect 
you could boost the CF indicators.
 4. In the examples we have used space delimited strings for lists of IDs in 
indicators and in queries. It may be better to use arrays of strings if your 
storage system and search engine support them. For instance Solr allows 
multi-valued fields, which correspond to arrays.
-
-
-  [1]: https://guide.finderbots.com
\ No newline at end of file

svn commit: r1665101 - /mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext

Reply via email to