recommender...

apalumbo Thu, 19 Mar 2015 14:21:52 -0700

Added: 
mahout/site/mahout_cms/trunk/content/users/mapreduce/recommender/recommender-first-timer-faq.mdtext
URL: 
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/mapreduce/recommender/recommender-first-timer-faq.mdtext?rev=1667878&view=auto
==============================================================================
--- 
mahout/site/mahout_cms/trunk/content/users/mapreduce/recommender/recommender-first-timer-faq.mdtext
 (added)
+++ 
mahout/site/mahout_cms/trunk/content/users/mapreduce/recommender/recommender-first-timer-faq.mdtext
 Thu Mar 19 21:21:28 2015
@@ -0,0 +1,49 @@
+Title: Recommender First-Timer FAQ
+
+# Recommender First Timer Dos and Don'ts
+
+Many people with an interest in recommenders arrive at Mahout since they're
+building a first recommender system. Some starting questions have been
+asked enough times to warrant a FAQ collecting advice and rules-of-thumb to
+newcomers.
+
+For the interested, these topics are treated in detail in the book [Mahout in 
Action](http://manning.com/owen/).
+
+Don't start with a distributed, Hadoop-based recommender; take on that
+complexity only if necessary. Start with non-distributed recommenders. It
+is simpler, has fewer requirements, and is more flexible. 
+
+As a crude rule of thumb, a system with up to 100M user-item associations
+(ratings, preferences) should "fit" onto one modern server machine with 4GB
+of heap available and run acceptably as a real-time recommender. The system
+is invariably memory-bound since keeping data in memory is essential to
+performance.
+
+Beyond this point it gets expensive to deploy a machine with enough RAM,
+so, designing for a distributed makes sense when nearing this scale.
+However most applications don't "really" have 100M associations to process.
+Data can be sampled; noisy and old data can often be aggressively pruned
+without significant impact on the result.
+
+The next question is whether or not your system has preference values, or
+ratings. Do users and items merely have an association or not, such as the
+existence or lack of a click? or is behavior translated into some scalar
+value representing the user's degree of preference for the item.
+
+If you have ratings, then a good place to start is a
+GenericItemBasedRecommender, plus a PearsonCorrelationSimilarity similarity
+metric. If you don't have ratings, then a good place to start is
+GenericBooleanPrefItemBasedRecommender and LogLikelihoodSimilarity.
+
+If you want to do content-based item-item similarity, you need to implement
+your own ItemSimilarity.
+
+If your data can be simply exported to a CSV file, use FileDataModel and
+push new files periodically.
+If your data is in a database, use MySQLJDBCDataModel (or its "BooleanPref"
+counterpart if appropriate, or its PostgreSQL counterpart, etc.) and put on
+top a ReloadFromJDBCDataModel.
+
+This should give a reasonable starter system which responds fast. The
+nature of the system is that new data comes in from the file or database
+only periodically -- perhaps on the order of minutes. 
\ No newline at end of file


Added: 
mahout/site/mahout_cms/trunk/content/users/mapreduce/recommender/userbased-5-minutes.mdtext
URL: 
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/mapreduce/recommender/userbased-5-minutes.mdtext?rev=1667878&view=auto
==============================================================================
--- 
mahout/site/mahout_cms/trunk/content/users/mapreduce/recommender/userbased-5-minutes.mdtext
 (added)
+++ 
mahout/site/mahout_cms/trunk/content/users/mapreduce/recommender/userbased-5-minutes.mdtext
 Thu Mar 19 21:21:28 2015
@@ -0,0 +1,144 @@
+Title:
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+# Creating a User-Based Recommender in 5 minutes
+
+##Prerequisites
+
+Create a java project in your favorite IDE and make sure mahout is on the 
classpath. The easiest way to accomplish this is by importing it via maven as 
described on the [Quickstart](/users/basics/quickstart.html) page.
+
+
+## Dataset
+
+Mahout's recommenders expect interactions between users and items as input. 
The easiest way to supply such data to Mahout is in the form of a textfile, 
where every line has the format *userID,itemID,value*. Here *userID* and 
*itemID* refer to a particular user and a particular item, and *value* denotes 
the strength of the interaction (e.g. the rating given to a movie).
+
+In this example, we'll use some made up data for simplicity. Create a file 
called "dataset.csv" and copy the following example interactions into the file. 
+
+<pre>
+1,10,1.0
+1,11,2.0
+1,12,5.0
+1,13,5.0
+1,14,5.0
+1,15,4.0
+1,16,5.0
+1,17,1.0
+1,18,5.0
+2,10,1.0
+2,11,2.0
+2,15,5.0
+2,16,4.5
+2,17,1.0
+2,18,5.0
+3,11,2.5
+3,12,4.5
+3,13,4.0
+3,14,3.0
+3,15,3.5
+3,16,4.5
+3,17,4.0
+3,18,5.0
+4,10,5.0
+4,11,5.0
+4,12,5.0
+4,13,0.0
+4,14,2.0
+4,15,3.0
+4,16,1.0
+4,17,4.0
+4,18,1.0
+</pre>
+
+## Creating a user-based recommender
+
+Create a class called *SampleRecommender* with a main method.
+
+The first thing we have to do is load the data from the file. Mahout's 
recommenders use an interface called *DataModel* to handle interaction data. 
You can load our made up interactions like this:
+
+<pre>
+DataModel model = new FileDataModel(new File("/path/to/dataset.csv"));
+</pre>
+
+In this example, we want to create a user-based recommender. The idea behind 
this approach is that when we want to compute recommendations for a particular 
users, we look for other users with a similar taste and pick the 
recommendations from their items. For finding similar users, we have to compare 
their interactions. There are several methods for doing this. One popular 
method is to compute the [correlation 
coefficient](https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient)
 between their interactions. In Mahout, you use this method as follows:
+
+<pre>
+UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
+</pre>
+
+The next thing we have to do is to define which similar users we want to 
leverage for the recommender. For the sake of simplicity, we'll use all that 
have a similarity greater than *0.1*. This is implemented via a 
*ThresholdUserNeighborhood*:
+
+<pre>UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.1, 
similarity, model);</pre>
+
+Now we have all the pieces to create our recommender:
+
+<pre>
+UserBasedRecommender recommender = new GenericUserBasedRecommender(model, 
neighborhood, similarity);
+</pre>
+        
+We can easily ask the recommender for recommendations now. If we wanted to get 
three items recommended for the user with *userID* 2, we would do it like this:
+       
+
+<pre>
+List<RecommendedItem> recommendations = recommender.recommend(2, 3);
+for (RecommendedItem recommendation : recommendations) {
+  System.out.println(recommendation);
+}
+</pre>
+
+
+Congratulations, you have built your first recommender!
+
+
+## Evaluation
+
+You might ask yourself, how to make sure that your recommender returns good 
results. Unfortunately, the only way to be really sure about the quality is by 
doing an A/B test with real users in a live system.
+
+We can however try to get a feel of the quality, by statistical offline 
evaluation. Just keep in mind that this does not replace a test with real users!
+
+One way to check whether the recommender returns good results is by doing a 
**hold-out** test. We partition our dataset into two sets: a trainingset 
consisting of 90% of the data and a testset consisting of 10%. Then we train 
our recommender using the training set and look how well it predicts the 
unknown interactions in the testset.
+
+To test our recommender, we create a class called *EvaluateRecommender* with a 
main method and add an inner class called *MyRecommenderBuilder* that 
implements the *RecommenderBuilder* interface. We implement the 
*buildRecommender* method and make it setup our user-based recommender:
+
+<pre>
+UserSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
+UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.1, similarity, 
dataModel);
+return new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
+</pre>
+
+Now we have to create the code for the test. We'll check how much the 
recommender misses the real interaction strength on average. We employ an 
*AverageAbsoluteDifferenceRecommenderEvaluator* for this. The following code 
shows how to put the pieces together and run a hold-out test: 
+
+<pre>
+DataModel model = new FileDataModel(new File("/path/to/dataset.csv"));
+RecommenderEvaluator evaluator = new 
AverageAbsoluteDifferenceRecommenderEvaluator();
+RecommenderBuilder builder = new MyRecommenderBuilder();
+double result = evaluator.evaluate(builder, null, model, 0.9, 1.0);
+System.out.println(result);
+</pre>
+
+Note: if you run this test multiple times, you will get different results, 
because the splitting into trainingset and testset is done randomly. 
+
+
+
+
+
+
+
+
+
+
+

Modified: mahout/site/mahout_cms/trunk/templates/standard.html
URL: 
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/templates/standard.html?rev=1667878&r1=1667877&r2=1667878&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/templates/standard.html (original)
+++ mahout/site/mahout_cms/trunk/templates/standard.html Thu Mar 19 21:21:28 
2015
@@ -165,49 +165,49 @@
                </li>
               <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Classification<b class="caret"></b></a>
                 <ul class="dropdown-menu">
-                  <li><a href="/users/classification/bayesian.html">Naive 
Bayes</a></li>
-                  <li><a 
href="/users/classification/hidden-markov-models.html">Hidden Markov 
Models</a></li>
-                  <li><a 
href="/users/classification/logistic-regression.html">Logistic 
Regression</a></li>
-                  <li><a 
href="/users/classification/partial-implementation.html">Random Forest</a></li>
+                  <li><a 
href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li>
+                  <li><a 
href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov 
Models</a></li>
+                  <li><a 
href="/users/mapreduce/classification/logistic-regression.html">Logistic 
Regression</a></li>
+                  <li><a 
href="/users/mapreduce/classification/partial-implementation.html">Random 
Forest</a></li>
 
                   <li class="divider"></li>
                   <li class="nav-header">Examples</li>
-                  <li><a 
href="/users/classification/breiman-example.html">Breiman example</a></li>
-                  <li><a 
href="/users/classification/twenty-newsgroups.html">20 newsgroups 
example</a></li>
+                  <li><a 
href="/users/mapreduce/classification/breiman-example.html">Breiman 
example</a></li>
+                  <li><a 
href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups 
example</a></li>
                 </ul></li>
                <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Clustering<b class="caret"></b></a>
                 <ul class="dropdown-menu">
-                <li><a 
href="/users/clustering/k-means-clustering.html">k-Means</a></li>
-                <li><a 
href="/users/clustering/canopy-clustering.html">Canopy</a></li>
-                <li><a href="/users/clustering/fuzzy-k-means.html">Fuzzy 
k-Means</a></li>
-                <li><a 
href="/users/clustering/streaming-k-means.html">Streaming KMeans</a></li>
-                <li><a 
href="/users/clustering/spectral-clustering.html">Spectral Clustering</a></li>
+                <li><a 
href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/streaming-k-means.html">Streaming 
KMeans</a></li>
+                <li><a 
href="/users/mapreduce/clustering/spectral-clustering.html">Spectral 
Clustering</a></li>
                 <li class="divider"></li>
                 <li class="nav-header">Commandline usage</li>
-                <li><a 
href="/users/clustering/k-means-commandline.html">Options for k-Means</a></li>
-                <li><a 
href="/users/clustering/canopy-commandline.html">Options for Canopy</a></li>
-                <li><a 
href="/users/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy 
k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/k-means-commandline.html">Options for 
k-Means</a></li>
+                <li><a 
href="/users/mapreduce/clustering/canopy-commandline.html">Options for 
Canopy</a></li>
+                <li><a 
href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for 
Fuzzy k-Means</a></li>
                 <li class="divider"></li>
                 <li class="nav-header">Examples</li>
-                <li><a 
href="/users/clustering/clustering-of-synthetic-control-data.html">Synthetic 
data</a></li>
+                <li><a 
href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic
 data</a></li>
                 <li class="divider"></li>
                 <li class="nav-header">Post processing</li>
-                <li><a href="/users/clustering/cluster-dumper.html">Cluster 
Dumper tool</a></li>
-                <li><a 
href="/users/clustering/visualizing-sample-clusters.html">Cluster 
visualisation</a></li>
+                <li><a 
href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper 
tool</a></li>
+                <li><a 
href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster 
visualisation</a></li>
                 </ul></li>
                 <li class="dropdown"> <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Recommendations<b class="caret"></b></a>
                 <ul class="dropdown-menu">
-                <li><a 
href="/users/recommender/quickstart.html">Quickstart</a></li>
-                <li><a 
href="/users/recommender/recommender-first-timer-faq.html">First Timer 
FAQ</a></li>
-                <li><a href="/users/recommender/userbased-5-minutes.html">A 
user-based recommender <br/>in 5 minutes</a></li>
-                               <li><a 
href="/users/recommender/matrix-factorization.html">Matrix 
factorization-based<br/> recommenders</a></li>
-                <li><a 
href="/users/recommender/recommender-documentation.html">Overview</a></li>
+                <li><a 
href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First 
Timer FAQ</a></li>
+                <li><a 
href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based 
recommender <br/>in 5 minutes</a></li>
+               <li><a 
href="/users/mapreduce/recommender/matrix-factorization.html">Matrix 
factorization-based<br/> recommenders</a></li>
+                <li><a 
href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li>
                 <li class="divider"></li>
                 <li class="nav-header">Hadoop</li>
-                <li><a 
href="/users/recommender/intro-itembased-hadoop.html">Intro to item-based 
recommendations<br/> with Hadoop</a></li>
-                <li><a href="/users/recommender/intro-als-hadoop.html">Intro 
to ALS recommendations<br/> with Hadoop</a></li>
+                <li><a 
href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to 
item-based recommendations<br/> with Hadoop</a></li>
+                <li><a 
href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS 
recommendations<br/> with Hadoop</a></li>
                 <li class="nav-header">Spark</li>
-                <li><a 
href="/users/recommender/intro-cooccurrence-spark.html">Intro to 
cooccurrence-based<br/> recommendations with Spark</a></li>
+                <li><a 
href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to 
cooccurrence-based<br/> recommendations with Spark</a></li>
               </ul>
             </li>
            </ul>

svn commit: r1667878 [4/4] - in /mahout/site/mahout_cms/trunk: content/users/algorithms/ content/users/environment/ content/users/mapreduce/ content/users/mapreduce/classification/ content/users/mapreduce/clustering/ content/users/mapreduce/recommender...

Reply via email to