My own feeling is that we need to get some sort of recommender that supports side information, possibly also as a classifier.
As everybody knows, I have been lately quite enamored of Menon and Elkan's paper on Latent Factor Log-Linear models. It seems to subsume most other factorization methods and supports side data very naturally. Training is reportedly very fast using SGD techniques. The paper is here: http://arxiv.org/abs/1006.2156 On Mon, Oct 4, 2010 at 7:03 AM, Sebastian Schelter <[email protected]> wrote: > Hi, > > The amount of work that is currently put in finishing 0.4 is amazing, I can > hardly follow all the mails, very cool to see that. I've had some time today > to write down ideas of features I have for version 0.5 and want to share it > here for feedback. > > First I can think of possible new features for RecommenderJob > > * add an option that makes the RecommenderJob use the output of the > related > o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob instead of > computing > the similarities again each time, this will give users the possibility > to > choose the interval in which to precompute the item similarities > > * add an option to make the RecommenderJob include "recommended because > of" > items to each recommended item (analogous to what is already available > at > GenericItemBasedRecommender.recommendedBecause(...)), showing this to > users > helps them understand why some item was recommended to them > > > Second I'd like Mahout to have a Map/Reduce implementation of the algorithm > described in Y. Zhou et al.: "Large-scale Parallel Collaborative Filtering > for the Netflix Prize" (http://bit.ly/cUPgqr). > > Here R is the matrix of ratings of users towards movies and each user and > each movie is projected on a "feature" space (the number of features is > defined before) so that the product of the resulting matrices U and M is a > low-rank approximization/factorization of R. > > Determining U and M is mathematically modelled as an optimization problem > and additionally some regularization is applied to avoid overfitting to the > known entries. This problem is solved with an iterative approach called > alternate least squares (ALS). > > If I understand the paper correctly this approach is easily parallelizable. > In order to estimate an user feature vector you need only access to all his > ratings and the feature vectors of all movies he/she rated. To estimate a > movie feature vector you need access to all its ratings and to the feature > vectors of the users who rated it. > > An unknown preference can then be predicted by computing the dot product of > the according user and movie feature vectors. > > Would be very nice if someone who is familiar with the paper or has the > time for a brief look into it could validate that, cause I don't fully trust > my mathematical analysis. > > I already created a first prototype implementation but I definitely need > help from someone checking it conceptually, optimizing the math related > parts and help me test ist. Maybe that could be an interesting task for the > upcoming Mahout hackathon in Berlin. > > --sebastian > > PS: @isabel I won't make it to the dinner today, need to rehearse my > talk... >
