another question: in the paper KRR had some serious limitations concerning the size of the dataset it could handle, how much data can (MAHOUT-702) handle and in which PC (or cluster) configuration ?
On Wed, Sep 21, 2011 at 3:44 AM, deneche abdelhakim <[email protected]>wrote: > cool, thanks :) > > > On Tue, Sep 20, 2011 at 11:10 PM, Hector Yee <[email protected]> wrote: > >> Yeah its a two line change to PassiveAggressive.java (MAHOUT-702) >> >> change the loss to: >> >> loss = hinge ( | score - actual| - epsilon ) where hinge(x) = 0 if x < 0, >> x >> otherwise >> epsilon is a new param that controls how much error we tolerate >> tau remains the same >> delta = sign(actual - score) * tau * instance >> >> >> On Tue, Sep 20, 2011 at 2:21 PM, Ted Dunning <[email protected]> >> wrote: >> >> > Anything that requires the solution of large linear systems is usually >> > susceptible to SGD approaches. >> > >> > On Tue, Sep 20, 2011 at 11:24 AM, deneche abdelhakim < >> [email protected] >> > >wrote: >> > >> > > I was reading this paper: >> > > >> > > "Combining Predictions for Accurate Recommender Systems" >> > > http://www.commendo.at/UserFiles/commendo/File/kdd2010-paper.pdf >> > > >> > > and one particular method used to blend different recommenders is KRR >> > > (Kernel Ridge Regression). The authors had the followings conclusion >> > about >> > > it: >> > > >> > > "KRR is worse than neural networks, but the results are promising. An >> > > increase of the training set size would lead to a more accurate model. >> > But >> > > the huge computational re- >> > > quirements of KRR limits us to about 6% data. The train time for one >> KRR >> > > model on 6% subset (about 42000 samples) is 4 hours." >> > > >> > > I don't know why, but I really want to see the quality of the results >> of >> > > this method when using larger training sets. So my question is the >> > > following: will such method benefit from a distributed version >> > (mapreduce) >> > > ? >> > > is such thing already available ? is it interesting to the Mahout >> project >> > > in >> > > general ? I started to document about it and it seems to require some >> big >> > > linear system solving. >> > > >> > >> >> >> >> -- >> Yee Yang Li Hector <https://plus.google.com/106746796711269457249> >> Professional Profile <http://www.linkedin.com/in/yeehector> >> http://hectorgon.blogspot.com/ (tech + travel) >> http://hectorgon.com (book reviews) >> > >
