thanks all. This is a very valuable info for a beginner. Does Mahout requires prefernce values in binary values in the range -1 to +1 or it can take any range like from 0 to 10 (say). thanks, Pradeep.
On Tue, Jun 23, 2009 at 2:12 PM, Ted Dunning <[email protected]> wrote: > This is what is traditionally done, but it is distinctly sub-optimal in > many > ways. The most serious problem is that there is a heuristic decision that > says what is important what is not. > > A preferable (and as far as I know never used or implemented) approach > would > be to build a real model that includes factors that actually help predict > the desired outcome. Methods to do this might include: > > a) LLR feature selection from several behavior types followed by IDF > weighted scoring. I have used this with additional follow on steps in > attrition and loss models for insurance with very good results, but never > used it in recommendations. The basic idea in the attrition and loss > models > was to develop positive and negative indicator sets for each outcome and > then cluster in the space of indicator scores. Finally, we built ANN > models > over the variables formed by distances to cluster centroids. For > recommendations, this would mean building positive and negative feature > sets > for all items for each kind of behavior. I would expect little gain from > negative scores but would still use them. With positive only sets, this > reduces (almost) to the sum of cooccurrence scores done in isolation on > each > kind of input. > > b) shared latent variable reductions across multiple behavior types. For > SVD or similar decomposition based techniques, this is equivalent to > reducing column adjoined matrices for the independent behaviors. Then, if > you have only one kind of information, you can use the SVD to fill in the > other, missing, information. > > c) probabilistic latent variable approaches. For LDA and such, you can put > all of the behavioral information together and use the model to predict > missing observations in the standard Bayesian kind of way. This is similar > to (b), but much better founded. > > On Tue, Jun 23, 2009 at 12:23 PM, Sean Owen <[email protected]> wrote: > > > For example, you could write a script that combines rating, > > purchase history, demographics, in some way that you think is useful, > > to produce 'preference' values. > > > > > > -- > Ted Dunning, CTO > DeepDyve > > 111 West Evelyn Ave. Ste. 202 > Sunnyvale, CA 94086 > http://www.deepdyve.com > 858-414-0013 (m) > 408-773-0220 (fax) >
