This is what is traditionally done, but it is distinctly sub-optimal in many ways. The most serious problem is that there is a heuristic decision that says what is important what is not.
A preferable (and as far as I know never used or implemented) approach would be to build a real model that includes factors that actually help predict the desired outcome. Methods to do this might include: a) LLR feature selection from several behavior types followed by IDF weighted scoring. I have used this with additional follow on steps in attrition and loss models for insurance with very good results, but never used it in recommendations. The basic idea in the attrition and loss models was to develop positive and negative indicator sets for each outcome and then cluster in the space of indicator scores. Finally, we built ANN models over the variables formed by distances to cluster centroids. For recommendations, this would mean building positive and negative feature sets for all items for each kind of behavior. I would expect little gain from negative scores but would still use them. With positive only sets, this reduces (almost) to the sum of cooccurrence scores done in isolation on each kind of input. b) shared latent variable reductions across multiple behavior types. For SVD or similar decomposition based techniques, this is equivalent to reducing column adjoined matrices for the independent behaviors. Then, if you have only one kind of information, you can use the SVD to fill in the other, missing, information. c) probabilistic latent variable approaches. For LDA and such, you can put all of the behavioral information together and use the model to predict missing observations in the standard Bayesian kind of way. This is similar to (b), but much better founded. On Tue, Jun 23, 2009 at 12:23 PM, Sean Owen <[email protected]> wrote: > For example, you could write a script that combines rating, > purchase history, demographics, in some way that you think is useful, > to produce 'preference' values. > -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 http://www.deepdyve.com 858-414-0013 (m) 408-773-0220 (fax)
