I wanted the user base to take note of the change Gokhan suggests below. I committed a variant of it just now which does indeed notably speed up most algorithms by more intelligently selecting possibilities to consider. On one test I am playing with it sped things up by 50% -- in another, more like 400%. Depending on your data this could be a big win.
Sean On Mon, Sep 7, 2009 at 2:03 PM, Gökhan Çapan<[email protected]> wrote: > Hi, Sean. > I think we talked about mostSimilarItems( ) function before, about a bug in > ItemBasedRecommender. > I think there is another issue, about performance. > > mostSimilarItems function gives the list of most similar items to a given > item. > In computation of those items, the algorithm looks at all other items in > data model, but if there is no user that doesn't rate 2 items together it is > needless to look if there is a similarity between active item and that item. > > > > That is the original function that returns most similar items list in > cf.taste.impl.recommender.GenericItemBasedRecommender: > > private List<RecommendedItem> doMostSimilarItems(long itemID, > int howMany, > TopItems.Estimator<Long> > estimator) throws TasteException { > DataModel model = getDataModel(); > FastIDSet allItemIDs = new FastIDSet(model.getNumItems()); > LongPrimitiveIterator it = model.getItemIDs(); > > > while (it.hasNext()) { > allItemIDs.add(it.nextLong()); > } > allItemIDs.remove(itemID); > return TopItems.getTopItems(howMany, allItemIDs.iterator(), null, > estimator); > } > > > > > I updated and use it that way: > private List<RecommendedItem> doMostSimilarItems(long itemID, > int howMany, > TopItems.Estimator<Long> > estimator) throws TasteException { > DataModel model = getDataModel(); > > FastIDSet set=new FastIDSet(); > PreferenceArray arr=model.getPreferencesForItem(itemID); > for(int i=0;i<arr.length();i++){ > set.addAll(model.getItemIDsFromUser(arr.get(i).getUserID())); > } > set.remove(itemID); > return TopItems.getTopItems(howMany,set.iterator(),null,estimator); > } > > > > The only difference between two function is: > the original one passes all items to getTopItems > mine passes only the items that have at least one user who've rated both > active item and that item. > > > > This little change made the algorithm pretty faster > (For my data set it runs 4 times faster now.) > > I wanted to inform you, if you want to try and update the code. > If for another reason original version of the code is better, please make me > know. > > > > >
