Hi Dmitriy,

the paper states that it's easy to find a good lambda value with 3-4 experiments. I still have to verify that assumption on a real dataset.

--sebastian

On 21.12.2010 00:57, Dmitriy Lyubimov wrote:
HI Sebastian,

how do you come up with a good Lambda to use with this weighted ALS?

On Mon, Dec 20, 2010 at 3:27 PM, Sebastian Schelter (JIRA)
<[email protected]>wrote:

     [
https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

Sebastian Schelter updated MAHOUT-542:
--------------------------------------

     Attachment: MAHOUT-542-2.patch

An updated version of the patch. I fixed a small bug, added more tests and
polished the code a little.

The distributed matrix factorization works fine now on a toy example. The
next steps will be to use real data and do some holdout tests.

MapReduce implementation of ALS-WR
----------------------------------

                 Key: MAHOUT-542
                 URL: https://issues.apache.org/jira/browse/MAHOUT-542
             Project: Mahout
          Issue Type: New Feature
          Components: Collaborative Filtering
    Affects Versions: 0.5
            Reporter: Sebastian Schelter
         Attachments: MAHOUT-452.patch, MAHOUT-542-2.patch


As Mahout is currently lacking a distributed collaborative filtering
algorithm that uses matrix factorization, I spent some time reading through
a couple of the Netflix papers and stumbled upon the "Large-scale Parallel
Collaborative Filtering for the Netflix Prize" available at
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf<http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08%28submitted%29.pdf>
.
It describes a parallel algorithm that uses "Alternating-Least-Squares
with Weighted-λ-Regularization" to factorize the preference-matrix and gives
some insights on how the authors distributed the computation using Matlab.
It seemed to me that this approach could also easily be parallelized
using Map/Reduce, so I sat down and created a prototype version. I'm not
really sure I got the mathematical details correct (they need some
optimization anyway), but I wanna put up my prototype implementation here
per Yonik's law of patches.
Maybe someone has the time and motivation to work a little on this with
me. It would be great if someone could validate the approach taken (I'm
willing to help as the code might not be intuitive to read) and could try to
factorize some test data and give feedback then.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Reply via email to