For the typical use of mahout (processing text) it doesn't make sense to use
a guassian kernel because all the vectors are extremely sparse. You would
end up having tons of these 'support vectors' just to span your space.

That being said, instead of having one N-dimensional weight vector you would
have very many of these 'support vectors' around.

Empirically for regression I've seen maybe 30% of the training data being
kept around as support vectors, which can make it prohibitively expensive
for large data sets.
Scoring would also be correspondingly slow. You're much better off using
some additive kernel e.g. intersection or chi-squared

KRR in essence it decays to a smoothed k-nearest neighbors learner....

On Tue, Sep 20, 2011 at 8:03 PM, deneche abdelhakim <[email protected]>wrote:

> another question: in the paper KRR had some serious limitations concerning
> the size of the dataset it could handle, how much data can (MAHOUT-702)
> handle and in which PC (or cluster) configuration ?
>
> On Wed, Sep 21, 2011 at 3:44 AM, deneche abdelhakim <[email protected]
> >wrote:
>
> > cool, thanks :)
> >
> >
> > On Tue, Sep 20, 2011 at 11:10 PM, Hector Yee <[email protected]>
> wrote:
> >
> >> Yeah its a two line change to PassiveAggressive.java (MAHOUT-702)
> >>
> >> change the loss to:
> >>
> >> loss = hinge ( | score - actual| - epsilon ) where hinge(x) = 0 if x <
> 0,
> >> x
> >> otherwise
> >> epsilon is a new param that controls how much error we tolerate
> >> tau remains the same
> >> delta = sign(actual - score) * tau * instance
> >>
> >>
> >> On Tue, Sep 20, 2011 at 2:21 PM, Ted Dunning <[email protected]>
> >> wrote:
> >>
> >> > Anything that requires the solution of large linear systems is usually
> >> > susceptible to SGD approaches.
> >> >
> >> > On Tue, Sep 20, 2011 at 11:24 AM, deneche abdelhakim <
> >> [email protected]
> >> > >wrote:
> >> >
> >> > > I was reading this paper:
> >> > >
> >> > > "Combining Predictions for Accurate Recommender Systems"
> >> > > http://www.commendo.at/UserFiles/commendo/File/kdd2010-paper.pdf
> >> > >
> >> > > and one particular method used to blend different recommenders is
> KRR
> >> > > (Kernel Ridge Regression). The authors had the followings conclusion
> >> > about
> >> > > it:
> >> > >
> >> > > "KRR is worse than neural networks, but the results are promising.
> An
> >> > > increase of the training set size would lead to a more accurate
> model.
> >> > But
> >> > > the huge computational re-
> >> > > quirements of KRR limits us to about 6% data. The train time for one
> >> KRR
> >> > > model on 6% subset (about 42000 samples) is 4 hours."
> >> > >
> >> > > I don't know why, but I really want to see the quality of the
> results
> >> of
> >> > > this method when using larger training sets. So my question is the
> >> > > following: will such method benefit from a distributed version
> >> > (mapreduce)
> >> > > ?
> >> > > is such thing already available ? is it interesting to the Mahout
> >> project
> >> > > in
> >> > > general ? I started to document about it and it seems to require
> some
> >> big
> >> > > linear system solving.
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Yee Yang Li Hector <https://plus.google.com/106746796711269457249>
> >> Professional Profile <http://www.linkedin.com/in/yeehector>
> >> http://hectorgon.blogspot.com/ (tech + travel)
> >> http://hectorgon.com (book reviews)
> >>
> >
> >
>



-- 
Yee Yang Li Hector <https://plus.google.com/106746796711269457249>
Professional Profile <http://www.linkedin.com/in/yeehector>
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)

Reply via email to