To your other question, my favorite large data sets are Netflix and
GroupLens. They are movie rating data, not query logs. I don't know of query
log data... but seem to vaguely recall Altavista (?) releasing something
like this years ago?

On Jul 22, 2009 8:41 AM, "Claudia Grieco" <[email protected]> wrote:

I had a sudden idea: would it be better to use an item based recommender?
(with TanimotoCoefficientSimilarity instead of
BooleanTanimotoCoefficientSimilarity that doesn't implement ItemSimilarity)
In this way I could overcome the scalability problem of having too many
"users" (queries)

Another question: do you know of any example query log data I could use to
experiment how the algorithm performs for large data sets?
Thanks again
Claudia

-----Messaggio originale-----
Da: Sean Owen [mailto:[email protected]]
Inviato: lunedì 20 luglio 2009 13.19

A: [email protected]
Oggetto: Re: Implement the related search feature with mahout

Actually I kind of misspoke. The fastest way to do this is in fact to

use a GenericUserBasedRecommender -- not because you need recommendations,
but because it exposes a ...
DataModel model = new MySQLBooleanPrefJDBCDataModel(...); // or
whatever you are using
UserSimilarity similarity = BooleanTanimotoCoefficientSimilarity(model);
UserNeighborhood similarity = new NearestNUserNeighborhood(10,
similarity, model);
BooleanUserGenericUserBasedRecommender recommender = new
BooleanUserGenericUserBasedRecommender(model, neighborhood,
similarity);
Rescorer<Pair<User,User>> rescorer = new Rescorer<Pair<User,User>>() {
 // implement your rescoring logic to affect how similar 'users' are
-- boost the returned value for popular queries
};
Object userID = ...; // current query ID
List<User> similarUsers = recommender.mostSimilarUsers(userID, 10,
rescorer);

Reply via email to