[ 
https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931787#action_12931787
 ] 

Sebastian Schelter commented on MAHOUT-542:
-------------------------------------------

Hi Ted,

I read through that paper a while ago when we exchanged ideas for Mahout 0.5 on 
the mailing list and to be honest I didn't really get the mathematical details. 
Nevertheless I understood that its possibilities are for superior to what we 
currently have and also to the approach described in the Netflix paper 
mentioned above, mainly because of the ability to handle side information and 
nominal values as you already mentioned. I think the paper does not describe a 
parallelization approach to the algorithm, though I'm not sure whether this is 
even necessary for it.

But I had the prototype code attached in the patch ready before we had that 
discussion and I have the hope that it could be finished with only a little 
input from someone else so I decided to put it up here. I'd have no problem 
dropping this here though when MAHOUT-525 is done and it turns out that putting 
work in there would give a much nicer recommender for Mahout. 




> MapReduce implementation of ALS-WR
> ----------------------------------
>
>                 Key: MAHOUT-542
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-542
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-452.patch
>
>
> As Mahout is currently lacking a distributed collaborative filtering 
> algorithm that uses matrix factorization, I spent some time reading through a 
> couple of the Netflix papers and stumbled upon the "Large-scale Parallel 
> Collaborative Filtering for the Netflix Prize" available at 
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf.
> It describes a parallel algorithm that uses "Alternating-Least-Squares with 
> Weighted-λ-Regularization" to factorize the preference-matrix and gives some 
> insights on how the authors distributed the computation using Matlab.
> It seemed to me that this approach could also easily be parallelized using 
> Map/Reduce, so I sat down and created a prototype version. I'm not really 
> sure I got the mathematical details correct (they need some optimization 
> anyway), but I wanna put up my prototype implementation here per Yonik's law 
> of patches.
> Maybe someone has the time and motivation to work a little on this with me. 
> It would be great if someone could validate the approach taken (I'm willing 
> to help as the code might not be intuitive to read) and could try to 
> factorize some test data and give feedback then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to