[
https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006350#comment-13006350
]
Sebastian Schelter commented on MAHOUT-542:
-------------------------------------------
Hi Ted,
I already spent some thinking on that, unfortunately no easy way to do that
came on my mind. I'll share my line of thoughts, maybe there's some error in
how I understand the algorithm:
The goal of ALS-WR is to factorize the rating matrix R into the user feature
matrix U and the item feature matrix M. I tries to minimize the regularized
squared error between R und U(T)M via an iterative algorithm. We start with a
randomly initialized M, use this to compute an optimized U, fix U after that to
compute an optimized M. We rotate between a fixed U and a fixed M until the
error converges (or a maximum number of iterations is reached).
The question is now whether we can modify this process to use multiple lambda
values (the regularization parameter used for the internal computations). You
stated that we are "reading the same input over and over", which I don't see as
all versions of U and M (with the exception of the first randomly initialized
version of M) are dependent on the lambda that was used to produce them. So as
I understand the algorithm, n values for lambda would not only mean n times the
memory for computations is needed but also n versions of each U and M would
need to be moved around. I think I could find a way to make the code do that
(by maybe using something like a multi-version-vector that carries different
results for different lambdas), but I'm not sure whether that's what you had in
mind.
> MapReduce implementation of ALS-WR
> ----------------------------------
>
> Key: MAHOUT-542
> URL: https://issues.apache.org/jira/browse/MAHOUT-542
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Affects Versions: 0.5
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch,
> MAHOUT-542-3.patch, MAHOUT-542-4.patch, MAHOUT-542-5.patch,
> MAHOUT-542-6.patch, logs.zip
>
>
> As Mahout is currently lacking a distributed collaborative filtering
> algorithm that uses matrix factorization, I spent some time reading through a
> couple of the Netflix papers and stumbled upon the "Large-scale Parallel
> Collaborative Filtering for the Netflix Prize" available at
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf.
> It describes a parallel algorithm that uses "Alternating-Least-Squares with
> Weighted-λ-Regularization" to factorize the preference-matrix and gives some
> insights on how the authors distributed the computation using Matlab.
> It seemed to me that this approach could also easily be parallelized using
> Map/Reduce, so I sat down and created a prototype version. I'm not really
> sure I got the mathematical details correct (they need some optimization
> anyway), but I wanna put up my prototype implementation here per Yonik's law
> of patches.
> Maybe someone has the time and motivation to work a little on this with me.
> It would be great if someone could validate the approach taken (I'm willing
> to help as the code might not be intuitive to read) and could try to
> factorize some test data and give feedback then.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira