We don't have this algorithm in Mahout yet: http://arxiv.org/abs/1006.2156
But it looks a lot like what you want. Short of that, you can definitely do recommendation like things with logistic regression and you don't have to worry much about the non-negative sort of constraints (in my experience). On Sun, Sep 5, 2010 at 1:12 PM, Dmitriy Lyubimov <[email protected]> wrote: > Ted, thank you very much. > > I would like to discuss one more generalization here if i may. > > Let's consider Netflix prize problem for the moment. That is, parameters of > regression are non-quantitative ones (person, movie ids essentually). > Regressand is the user's score. I guess many familiar with Yehuda Koren's > approach to this when he basically used SGD as non-negative factorization > and he also mentioned something about applying logistics function on top of > it. I.e. the regression looks exactly like it would for logistic regression > (he also added biases), with exception that it is more of a nonnegative one > (factors are not allowed to do negative). > > The problem i currently have on my hands is a hybrid of those. I.e. imagine > that in addition to some non-quantitative features (person, movie) you know > some quantitative features about movie (say genre scores that come out of > some sort of encyclopedic database, i.e. manually trained taxonomy) (you > might also know some quantitative features about person too, but let's keep > it simple for the purpose of this discussion). > > It's very easy for me to go in and create individual regression for a user > based on their reaction (like /didn't like) and what i know of quantitative > qualities of movies. > > However, at some point i start feeling like movie genre ratings are not > enough. Some movies have still some pretty unique factors about them that > we > don't really know or rated as a feature. > > So what i really want is probably nonnegative factorization but the one > that > takes into account quantitative features that come from different aspects > of > a given instance of (person, movie) interaction . (movie genre, time of > day, > weather outside, etc., whatever we think may have a good chance to be a > good > feature without really going thru a PCA or feature selection process at the > moment). > So encountering quantitative features we may search for regression > parameters, but for non-quatitative features (person, movie) i'd still > prefer to have non-negative biggest factors learned based on history. > > Is there's a way to merge both those approaches into one, as they seem to > be > really similar? (i.e. regressions with non-negative factorization)? > > Intuitively i feel that those approaches are really similar (difference is > in NNF we are really guessing the principal factors input, essentially). > And there must be a relatively simple way to morph it all in a hybrid > approach where some of betas interact with quantitative features x but yet > another ones interact with non-negative factors associated with > non-quantitative input (such as person id) encountered in the sample. > > Does it make sense? is there a way to do this in Mahout? > > Thank you very much. > -Dmitriy. > > > > On Sat, Sep 4, 2010 at 3:05 PM, Ted Dunning <[email protected]> wrote: > > > I generally add in the constant term to the feature vector if I want to > use > > it. You are correct that it is usually critical to correct function, but > I > > prefer to not have a special case for it. The one place where I think > that > > is wrong is where you want to have special treatment by the prior. It is > > common to have a very different prior on the intercept than on the > > coefficients. My only defense there is that common priors for the > > coefficients like L1 allow for plenty of latitude on the intercept so > that > > as long as the data outweigh the prior, this doesn't matter. There is a > > similar distinctive effect between interactions and main effects. > > > > One place it would matter a lot is in multi-level inference where you > wind > > up with a pretty strong prior from the higher level regressions (since > that > > is where most of the data actually is). In that case, I would probably > > rather separate the handling. In fact, at that point, I think I would > > probably go with a grouped prior to allow handling all of these cases in > a > > coherent setting. > > > > On the second question, betas can definitely go negative. That is how > the > > model expresses an effect that decreases the likelihood of success. > > > > On Sat, Sep 4, 2010 at 1:28 PM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > There's something i don't understand about your derivation . > > > > > > > > > > > > I think Bishop generally suggests that in linear regression y=beta_0 + > > > <beta, x> (so there's an intercept) > > > and i think he uses similar approach with fitting to logistic function > > > where > > > i think he suggests to use P( [mu + <beta,x>]/s ) > > > which of course can be thought of again as P(beta_0+<beta,x>) > > > > > > but if there's no intercept beta_0, then y(x=(0,...0)^T | beta) is > > always > > > 0. Which is not true of course in most situations. Does your method > imply > > > that having trivial input (all 0s ) would produce 0 estimation? > > > > > > Second question, are the betas allowed to go negative? > > > > > >
