Adding implicit feedback ALS to MLlib user guide

Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/93b96b44
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/93b96b44
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/93b96b44

Branch: refs/heads/master
Commit: 93b96b44d778716a4e76bdcf68d6a07694a06460
Parents: c6ceaea
Author: Nick Pentreath <nick.pentre...@gmail.com>
Authored: Fri Oct 4 14:39:44 2013 +0200
Committer: Nick Pentreath <nick.pentre...@gmail.com>
Committed: Fri Oct 4 14:39:44 2013 +0200

----------------------------------------------------------------------
 docs/mllib-guide.md | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/93b96b44/docs/mllib-guide.md
----------------------------------------------------------------------
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index f991d86..c1ff9c4 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -144,10 +144,9 @@ Available algorithms for clustering:
 
 # Collaborative Filtering 
 
-[Collaborative
-filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering)
+[Collaborative 
filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering)
 is commonly used for recommender systems.  These techniques aim to fill in the
-missing entries of a user-product association matrix.  MLlib currently supports
+missing entries of a user-item association matrix.  MLlib currently supports
 model-based collaborative filtering, in which users and products are described
 by a small set of latent factors that can be used to predict missing entries.
 In particular, we implement the [alternating least squares
@@ -158,7 +157,24 @@ following parameters:
 * *numBlocks* is the number of blacks used to parallelize computation (set to 
-1 to auto-configure). 
 * *rank* is the number of latent factors in our model.
 * *iterations* is the number of iterations to run.
-* *lambda* specifies the regularization parameter in ALS. 
+* *lambda* specifies the regularization parameter in ALS.
+* *implicitPrefs* specifies whether to use the *explicit feedback* ALS variant 
or one adapted for *implicit feedback* data
+* *alpha* is a parameter applicable to the implicit feedback variant of ALS 
that governs the *baseline* confidence in preference observations
+
+## Explicit vs Implicit Feedback
+
+The standard approach to matrix factorization based collaborative filtering 
treats 
+the entries in the user-item matrix as *explicit* preferences given by the 
user to the item.
+
+It is common in many real-world use cases to only have access to *implicit 
feedback* 
+(e.g. views, clicks, purchases, likes, shares etc.). The approach used in 
MLlib to deal with 
+such data is taken from 
+[Collaborative Filtering for Implicit Feedback 
Datasets](http://research.yahoo.com/pub/2433).
+Essentially instead of trying to model the matrix of ratings directly, this 
approach treats the data as 
+a combination of binary preferences and *confidence values*. The ratings are 
then related 
+to the level of confidence in observed user preferences, rather than explicit 
ratings given to items. 
+The model then tries to find latent factors that can be used to predict the 
expected preference of a user
+for an item. 
 
 Available algorithms for collaborative filtering: 
 

Reply via email to