[
https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979194#action_12979194
]
Sebastian Schelter commented on MAHOUT-542:
-------------------------------------------
No, unfortunately there's still a lot of open work here. ALSWRFactorizer is
just the non-distributed implementation of this algorithm.
In this issue I'm reaching for a distributed implementation. The actual matrix
factorization part is working, but there are some open problems:
* the factorization needs a regularization parameter called lambda, which
heavily influences the quality of the result. I don't see how to automatically
find a near optimal lambda (which would be key requirement for providing a
certain ease of use of this algorithm). I have some code in the works that can
find a near optimal lambda in a non-distributed way, but I'm not sure whether
my approach is mathematically correct, I will put it up for review when I'm
done.
* When we have the factorization we can easily estimate single preferences by
computing the dot product of the user and item vectors from the factorization.
However if this job here should produce recommendations for all users, we
cannot naively multiply the transpose of the user features matrix with the item
features matrix to estimate all possible preferences as these are dense
matrices. We need to find a way to isolate a few candidate items per user,
maybe by utilizing item cooccurrence. I'm not sure what's the best approach
here either as this problem is not covered in the paper.
> MapReduce implementation of ALS-WR
> ----------------------------------
>
> Key: MAHOUT-542
> URL: https://issues.apache.org/jira/browse/MAHOUT-542
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Affects Versions: 0.5
> Reporter: Sebastian Schelter
> Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch, MAHOUT-542-3.patch
>
>
> As Mahout is currently lacking a distributed collaborative filtering
> algorithm that uses matrix factorization, I spent some time reading through a
> couple of the Netflix papers and stumbled upon the "Large-scale Parallel
> Collaborative Filtering for the Netflix Prize" available at
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf.
> It describes a parallel algorithm that uses "Alternating-Least-Squares with
> Weighted-λ-Regularization" to factorize the preference-matrix and gives some
> insights on how the authors distributed the computation using Matlab.
> It seemed to me that this approach could also easily be parallelized using
> Map/Reduce, so I sat down and created a prototype version. I'm not really
> sure I got the mathematical details correct (they need some optimization
> anyway), but I wanna put up my prototype implementation here per Yonik's law
> of patches.
> Maybe someone has the time and motivation to work a little on this with me.
> It would be great if someone could validate the approach taken (I'm willing
> to help as the code might not be intuitive to read) and could try to
> factorize some test data and give feedback then.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.