Author: pat
Date: Sun Mar 8 17:02:27 2015
New Revision: 1665051
URL: http://svn.apache.org/r1665051
Log:
CMS commit to mahout by pat
Modified:
mahout/site/mahout_cms/trunk/content/users/recommender/quickstart.mdtext
Modified:
mahout/site/mahout_cms/trunk/content/users/recommender/quickstart.mdtext
URL:
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/quickstart.mdtext?rev=1665051&r1=1665050&r2=1665051&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/recommender/quickstart.mdtext
(original)
+++ mahout/site/mahout_cms/trunk/content/users/recommender/quickstart.mdtext
Sun Mar 8 17:02:27 2015
@@ -1,12 +1,25 @@
Title: Recommender Quickstart
-# Recommender Quickstart
+# Recommender Overview
-It's very easy to get started with Mahout's recommenders. You don't need to
know and have Hadoop for this. Here we list resources that might be helpful for
some first steps:
+Recommenders have changed over the years. Mahout contains a long list of them,
which you can still use. But to get the best out of our more modern aproach
we'll need to think of the Recommender as a "model creation"
component—supplied by Mahout's new spark-itemsimilarity job, and a
"serving" component—supplied by a modern scalable search engine, like
Solr.
- * Steve Cook created a [video
tutorial](https://www.youtube.com/watch?v=yD40rVKUwPI) on how to create a
simple item-based recommender from scratch using Eclipse. (Note that you can
avoid manually downloading the library jars by including mahout as [maven
dependency](/general/downloads.html) into your project).
+
- * The paper [Collaborative Filtering with Apache
Mahout](http://ssc.io/wp-content/uploads/2013/02/cf-mahout.pdf) by Sebastian
Schelter and Sean Owen gives a short overview of Mahout's non-distributed
recommenders and has pointers to research papers describing the underlying
algorithms.
+To integrate with your application you will collect user interactions storing
them in a DB and also in a from usable by Mahout. The simplest way to do this
is log interactions to csv files (user-id, item-id). The DB should be setup to
contain the last n user interactions, which will form part of the query for
recommendations.
- * For a more full featured Multimodal Recommender based on the newest Spark
version of Mahout and integration with a
-fast server using a search engine see references on the [Mahout
site](http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html).
\ No newline at end of file
+Mahout's spark-itemsimilarity will create a table of (item-id,
list-of-similar-items) in csv form. Think of this as an item collection with
one field containing the item-ids of similar items. Index this with your search
engine.
+
+When your application needs recommendations for a specific person, get the
latest user history of interactions from the DB and query the indicator
collection with this history. You will get back an ordered list of item-ids.
These are your recommendations. You may wish to filter out any that the user
has already seen but that will depend on your use case.
+
+##References
+
+1. A free ebook, which talks about the general idea: [Practical Machine
Learning](https://www.mapr.com/practical-machine-learning)
+2. A slide deck, which talks about mixing actions or other indicators:
[Creating a Multimodal Recommender with Mahout and a Search
Engine](http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/)
+3. Two blog posts: [What's New in Recommenders: part
#1](http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/)
+and [What's New in Recommenders: part
#2](http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/)
+3. A post describing the loglikelihood ratio: [Surprise and
Coinsidense](http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html)
LLR is used to reduce noise in the data while keeping the calculations O(n)
complexity.
+
+##Mahout Jobs
+
+See the page describing
[*spark-itemsimilarity*](http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html)
for more details.
\ No newline at end of file