[ 
https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006350#comment-13006350
 ] 

Sebastian Schelter commented on MAHOUT-542:
-------------------------------------------

Hi Ted,

I already spent some thinking on that, unfortunately no easy way to do that 
came on my mind. I'll share my line of thoughts, maybe there's some error in 
how I understand the algorithm:

The goal of ALS-WR is to factorize the rating matrix R into the user feature 
matrix U and the item feature matrix M. I tries to minimize the regularized 
squared error between R und U(T)M via an iterative algorithm. We start with a 
randomly initialized M, use this to compute an optimized U, fix U after that to 
compute an optimized M. We rotate between a fixed U and a fixed M until the 
error converges (or a maximum number of iterations is reached).

The question is now whether we can modify this process to use multiple lambda 
values (the regularization parameter used for the internal computations). You 
stated that we are "reading the same input over and over", which I don't see as 
all versions of U and M (with the exception of the first randomly initialized 
version of M) are dependent on the lambda that was used to produce them. So as 
I understand the algorithm, n values for lambda would not only mean n times the 
memory for computations is needed but also n versions of each U and M would 
need to be moved around. I think I could find a way to make the code do that 
(by maybe using something like a multi-version-vector that carries different 
results for different lambdas), but I'm not sure whether that's what you had in 
mind.

> MapReduce implementation of ALS-WR
> ----------------------------------
>
>                 Key: MAHOUT-542
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-542
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>         Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch, 
> MAHOUT-542-3.patch, MAHOUT-542-4.patch, MAHOUT-542-5.patch, 
> MAHOUT-542-6.patch, logs.zip
>
>
> As Mahout is currently lacking a distributed collaborative filtering 
> algorithm that uses matrix factorization, I spent some time reading through a 
> couple of the Netflix papers and stumbled upon the "Large-scale Parallel 
> Collaborative Filtering for the Netflix Prize" available at 
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf.
> It describes a parallel algorithm that uses "Alternating-Least-Squares with 
> Weighted-λ-Regularization" to factorize the preference-matrix and gives some 
> insights on how the authors distributed the computation using Matlab.
> It seemed to me that this approach could also easily be parallelized using 
> Map/Reduce, so I sat down and created a prototype version. I'm not really 
> sure I got the mathematical details correct (they need some optimization 
> anyway), but I wanna put up my prototype implementation here per Yonik's law 
> of patches.
> Maybe someone has the time and motivation to work a little on this with me. 
> It would be great if someone could validate the approach taken (I'm willing 
> to help as the code might not be intuitive to read) and could try to 
> factorize some test data and give feedback then.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to