I wanted the user base to take note of the change Gokhan suggests
below. I committed a variant of it just now which does indeed notably
speed up most algorithms by more intelligently selecting possibilities
to consider. On one test I am playing with it sped things up by 50% --
in another, more like 400%. Depending on your data this could be a big
win.

Sean

On Mon, Sep 7, 2009 at 2:03 PM, Gökhan Çapan<[email protected]> wrote:
> Hi, Sean.
> I think we talked about mostSimilarItems( ) function before, about a bug in
> ItemBasedRecommender.
> I think there is another issue, about performance.
>
> mostSimilarItems function gives the list of most similar items to a given
> item.
> In computation of those items, the algorithm looks at all other items in
> data model, but if there is no user that doesn't rate 2 items together it is
> needless to look if there is a similarity between active item and that item.
>
>
>
> That is the original function that returns most similar items list in
> cf.taste.impl.recommender.GenericItemBasedRecommender:
>
>  private List<RecommendedItem> doMostSimilarItems(long itemID,
>                                                    int howMany,
>                                                    TopItems.Estimator<Long>
> estimator) throws TasteException {
>     DataModel model = getDataModel();
>     FastIDSet allItemIDs = new FastIDSet(model.getNumItems());
>     LongPrimitiveIterator it = model.getItemIDs();
>
>
>     while (it.hasNext()) {
>       allItemIDs.add(it.nextLong());
>     }
>     allItemIDs.remove(itemID);
>     return TopItems.getTopItems(howMany, allItemIDs.iterator(), null,
> estimator);
>   }
>
>
>
>
> I updated and use it that way:
>  private List<RecommendedItem> doMostSimilarItems(long itemID,
>                                                    int howMany,
>                                                    TopItems.Estimator<Long>
> estimator) throws TasteException {
>     DataModel model = getDataModel();
>
>       FastIDSet set=new FastIDSet();
>       PreferenceArray arr=model.getPreferencesForItem(itemID);
>       for(int i=0;i<arr.length();i++){
>           set.addAll(model.getItemIDsFromUser(arr.get(i).getUserID()));
>       }
>       set.remove(itemID);
>       return TopItems.getTopItems(howMany,set.iterator(),null,estimator);
>   }
>
>
>
> The only difference between two function is:
> the original one passes all items to getTopItems
> mine passes only the items that have at least one user who've rated both
> active item and that item.
>
>
>
> This little change made the algorithm pretty faster
> (For my data set it runs 4 times faster now.)
>
> I wanted to inform you, if you want to try and update the code.
> If for another reason original version of the code is better, please make me
> know.
>
>
>
>
>

Reply via email to