[jira] Commented: (MAHOUT-542) MapReduce implementation of ALS-WR

Sebastian Schelter (JIRA) Fri, 31 Dec 2010 03:50:19 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976238#action_12976238
 ]


Sebastian Schelter commented on MAHOUT-542:
-------------------------------------------

{quote}
Can you compute factorizations for multiple values of lambda in one go and then 
evaluate all of them in one pass?

This would require that parallelALS accept a list of lambdas and produce 
multiple outputs
It would also require that evaluateALS accept multiple models as well.
{quote}

I don't think this could work with the distributed implementation as each 
iteration needs to see the feature vectors that have been computed in previous 
iteration with the same lambda value, so I don't see how to concurrently 
compute the values for several lambdas. My evaluation code currently depends on 
the feature matrices fitting into memory which might not always be the case, 
which could be another bottleneck. 

However in a non-distributed implementation this approach could work, would it 
help to pick a sample from the input data, try to find a near-optimal lambda on 
that in a non-distributed way and use that for the distributed computation too? 
Not sure on this.

Another issue I'm currently stuck on is how to compute the recommendations 
after the factorization. The rating matrix is factorized in the feature 
matrices U and M for users and items and a single rating can easily predicted 
by multiplying the feature vectors of the user and the item. If we wanna 
compute batch recommendations for all users we need to find an intelligent way 
to select "candidate items" to recommend for each users as we can't simply 
compute t(U) * M because those are dense matrices.

Our non-distributed SVDRecommender uses simple cooccurrence to identify those 
candidate items, so one way I could think of would be to use RowSimilarityJob 
to find cooccurring items and foreach user to only compute his ratings for 
items that cooccur with preferred ones. Not sure either if this is the best way 
to do this.

> MapReduce implementation of ALS-WR
> ----------------------------------
>
>                 Key: MAHOUT-542
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-542
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch, MAHOUT-542-3.patch
>
>
> As Mahout is currently lacking a distributed collaborative filtering 
> algorithm that uses matrix factorization, I spent some time reading through a 
> couple of the Netflix papers and stumbled upon the "Large-scale Parallel 
> Collaborative Filtering for the Netﬂix Prize" available at 
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf.
> It describes a parallel algorithm that uses "Alternating-Least-Squares with 
> Weighted-λ-Regularization" to factorize the preference-matrix and gives some 
> insights on how the authors distributed the computation using Matlab.
> It seemed to me that this approach could also easily be parallelized using 
> Map/Reduce, so I sat down and created a prototype version. I'm not really 
> sure I got the mathematical details correct (they need some optimization 
> anyway), but I wanna put up my prototype implementation here per Yonik's law 
> of patches.
> Maybe someone has the time and motivation to work a little on this with me. 
> It would be great if someone could validate the approach taken (I'm willing 
> to help as the code might not be intuitive to read) and could try to 
> factorize some test data and give feedback then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-542) MapReduce implementation of ALS-WR

Reply via email to