Re: Regression+SGD question

Ted Dunning Sun, 05 Sep 2010 15:25:26 -0700

We don't have this algorithm in Mahout yet: http://arxiv.org/abs/1006.2156


But it looks a lot like what you want.

Short of that, you can definitely do recommendation like things with
logistic regression and you don't have
to worry much about the non-negative sort of constraints (in my experience).

On Sun, Sep 5, 2010 at 1:12 PM, Dmitriy Lyubimov <[email protected]> wrote:

> Ted, thank you very much.
>
> I would like to discuss one more generalization here if i may.
>
> Let's consider Netflix prize problem for the moment. That is, parameters of
> regression are non-quantitative ones (person, movie ids essentually).
> Regressand is the user's score. I guess many familiar with Yehuda Koren's
> approach to this when he basically used SGD as non-negative factorization
> and he also mentioned something about applying logistics function on top of
> it. I.e. the regression looks exactly like it would for logistic regression
> (he also added biases), with exception that it is more of a nonnegative one
> (factors are not allowed to do negative).
>
> The problem i currently have on my hands is a hybrid of those. I.e. imagine
> that in addition to some non-quantitative features (person, movie) you know
> some quantitative features about movie (say genre scores that come out of
> some sort of encyclopedic database, i.e. manually trained taxonomy) (you
> might also know some quantitative features about person too, but let's keep
> it simple for the purpose of this discussion).
>
> It's very easy for me to go in and create individual regression for a user
> based on their reaction (like /didn't like) and what i know of quantitative
> qualities of movies.
>
> However, at some point i start feeling like movie genre ratings are not
> enough. Some movies have still some pretty unique factors about them that
> we
> don't really know or rated as a feature.
>
> So what i really want is probably nonnegative factorization but the one
> that
> takes into account quantitative features that come from different aspects
> of
> a given instance of (person, movie) interaction . (movie genre, time of
> day,
> weather outside, etc., whatever we think may have a good chance to be a
> good
> feature without really going thru a PCA or feature selection process at the
> moment).
> So encountering quantitative features we may search for regression
> parameters, but for non-quatitative features (person, movie) i'd still
> prefer to have non-negative biggest factors learned based on history.
>
> Is there's a way to merge both those approaches into one, as they seem to
> be
> really similar? (i.e. regressions with non-negative factorization)?
>
> Intuitively i feel that those approaches are really similar (difference is
> in NNF we are really guessing the principal factors input, essentially).
>  And there must be a relatively simple way to morph it all in a hybrid
> approach where some of betas interact with quantitative features x but yet
> another ones interact with non-negative factors associated with
> non-quantitative input (such as person id) encountered in the sample.
>
> Does it make sense? is there a way to do this in Mahout?
>
> Thank you very much.
> -Dmitriy.
>
>
>
> On Sat, Sep 4, 2010 at 3:05 PM, Ted Dunning <[email protected]> wrote:
>
> > I generally add in the constant term to the feature vector if I want to
> use
> > it.  You are correct that it is usually critical to correct function, but
> I
> > prefer to not have a special case for it.  The one place where I think
> that
> > is wrong is where you want to have special treatment by the prior.  It is
> > common to have a very different prior on the intercept than on the
> > coefficients.  My only defense there is that common priors for the
> > coefficients like L1 allow for plenty of latitude on the intercept so
> that
> > as long as the data outweigh the prior, this doesn't matter.  There is a
> > similar distinctive effect between interactions and main effects.
> >
> > One place it would matter a lot is in multi-level inference where you
> wind
> > up with a pretty strong prior from the higher level regressions (since
> that
> > is where most of the data actually is).  In that case, I would probably
> > rather separate the handling.  In fact, at that point, I think I would
> > probably go with a grouped prior to allow handling all of these cases in
> a
> > coherent setting.
> >
> > On the second question, betas can definitely go negative.  That is how
> the
> > model expresses an effect that decreases the likelihood of success.
> >
> > On Sat, Sep 4, 2010 at 1:28 PM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> >
> > > There's something i don't understand about your derivation .
> > >
> > >
> > >
> > > I think Bishop  generally suggests that in linear regression y=beta_0 +
> > > <beta, x> (so there's an intercept)
> > > and i think he uses similar approach with fitting to logistic function
> > > where
> > > i think he suggests to use P( [mu + <beta,x>]/s )
> > > which of course can be thought of again as P(beta_0+<beta,x>)
> > >
> > > but if there's no intercept beta_0, then y(x=(0,...0)^T | beta)  is
> > always
> > > 0. Which is not true of course in most situations. Does your method
> imply
> > > that having trivial input (all 0s ) would produce 0 estimation?
> > >
> > > Second question, are the betas allowed to go negative?
> > >
> >
>

Re: Regression+SGD question

Reply via email to