[Taste] Sanity Check and Questions

Grant Ingersoll Thu, 18 Jun 2009 08:30:26 -0700

I'm working on a demo on Mahout and part of it is on collab.filtering. For the CF part, I'm taking the lead from an idea fromTed about a way to demonstrate how CF works conceptually. (Ted pleasecorrect me if my understanding is incorrect)

I took a subset of Wikipedia articles (2302, available at http://people.apache.org/~gsingers/wikipedia/chunks.tar.gz, created by the WikipediaXMLSplitter in the example directory).Next, I picked a topic of interest, in this case all docs containingthe phrase "Abraham Lincoln", and I made the assumption that there are10 users out of a total of 1000 who are "Lincolnphiles" and havethereby rated most of the articles (17 total) on the topic. Theratings range between -5 and 5 (as doubles), but for the most part,the Lincolnphiles tend to like the same things, but to varyingdegrees. (Note, I did these ratings by hand and thus "stacked thedeck") The Lincolnphiles are really obsessed and did not rate anyother documents. However, not all of them rated all 17 articles.Next, I assumed the other 990 users are randomly rating across all thedocuments and in the same range. Thus, for every article in the set,I randomly grabbed X users and then have them randomly assign a degreeof like or dislike in the range mentioned.

I then implemented a basic recommender according to the Taste docsunder User-based recommenders section. I then pass in the user id ofone of the Lincolnphiles. The results I get back are a bit surprisingin that none of the recommendations are for other items rated highlyby the Lincolnphiles, despite the fact that, when setting theneighborhood to be 10, all of the other Lincolnphiles are in theneighborhood plus one non-Lincolnphile. I would expect therecommendations to be for items that are not rated by my Lincolnphile,but have been rated by the other Lincolnphiles, or at least some ofthem, but in fact none of the recommendations are for Lincoln docs.

OK, so I then played around a bit with the neighborhood size. If Imake it 9 (which is the number of other Lincolnphiles in the system)or less, I then get what I expected. So, it seems the one non-Lincolnphile rated a lot more items than all the Lincolnphiles. Isthat why that user's items seem to dominate the recommendations? Inlooking at the non-Lincoln user, I see two common items that they bothrated, one that they both really liked and one that they disagreed on.

I'm not exactly sure what my questions are, other than the one aboutan active user dominating like minded, but less active raters andwhat's the appropriate thing to do there, if anything, but I wanted tomake sure this all makes sense.

Also, is there any notion in Taste similar to Lucene's explain method (http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query,%20int))?

After this sanity check, my next goal is to show how a newLincolnphile coming into the system would be guided to other contenton Lincoln.

[And yes, once done, this code will be publicly available, but it willbe a little while]

Here's my snippet of code for recommending, pretty much verbatim fromthe Taste docs:UserSimilarity userSimilarity = newPearsonCorrelationSimilarity(dataModel);

// Optional:

userSimilarity.setPreferenceInferrer(newAveragingPreferenceInferrer(dataModel));


    UserNeighborhood neighborhood =

new NearestNUserNeighborhood(neighSize, userSimilarity,dataModel);

    Collection<User> users = neighborhood.getUserNeighborhood(userId);
    for (User neighbor : users) {
      System.out.println("Neighbor: " + neighbor);
    }

    Recommender recommender =

new GenericUserBasedRecommender(dataModel, neighborhood,userSimilarity);Recommender cachingRecommender = newCachingRecommender(recommender);



    List<RecommendedItem> recommendations =
            cachingRecommender.recommend(userId, 10);
    System.out.println("Recommendations:");
    for (RecommendedItem item : recommendations) {
      Item theItem = item.getItem();
      String title = idsToTitle.get(theItem.getID().toString());
      System.out.println("Doc Id: " + theItem + " Title: " + title);
    }

Cheers,
Grant

[Taste] Sanity Check and Questions

Reply via email to