[
https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006109#comment-13006109
]
Ted Dunning commented on MAHOUT-542:
------------------------------------
Sebastian,
I don't want to derail your commit, but your question about regularization
suggested a thought to me.
One of the great advantages of the random projection methods over power law
methods is due to the fact that iteration is so evil in Hadoop-base map-reduce,
especially when you are simply reading the same input over and over.
With ALS-WR, you can run the program again for each value of regularization
parameter, but there is really nothing except possibly memory size from running
all of these optimizations at the same time.
How hard would that be, do you think, to interleave the computations for
multiple values of regularization parameter into a single run of ALS-WR?
> MapReduce implementation of ALS-WR
> ----------------------------------
>
> Key: MAHOUT-542
> URL: https://issues.apache.org/jira/browse/MAHOUT-542
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Affects Versions: 0.5
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch,
> MAHOUT-542-3.patch, MAHOUT-542-4.patch, MAHOUT-542-5.patch,
> MAHOUT-542-6.patch, logs.zip
>
>
> As Mahout is currently lacking a distributed collaborative filtering
> algorithm that uses matrix factorization, I spent some time reading through a
> couple of the Netflix papers and stumbled upon the "Large-scale Parallel
> Collaborative Filtering for the Netflix Prize" available at
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf.
> It describes a parallel algorithm that uses "Alternating-Least-Squares with
> Weighted-λ-Regularization" to factorize the preference-matrix and gives some
> insights on how the authors distributed the computation using Matlab.
> It seemed to me that this approach could also easily be parallelized using
> Map/Reduce, so I sat down and created a prototype version. I'm not really
> sure I got the mathematical details correct (they need some optimization
> anyway), but I wanna put up my prototype implementation here per Yonik's law
> of patches.
> Maybe someone has the time and motivation to work a little on this with me.
> It would be great if someone could validate the approach taken (I'm willing
> to help as the code might not be intuitive to read) and could try to
> factorize some test data and give feedback then.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira