Optimize getNumUsersWithPreferenceFor(long... itemIDs)
------------------------------------------------------

                 Key: MAHOUT-423
                 URL: https://issues.apache.org/jira/browse/MAHOUT-423
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
    Affects Versions: 0.3
            Reporter: Jonathan Young


I ran a simple collaborative filtering application using a 
GenericBooleanPrefDataModel built from (a subset of) the Netflix data, Tanimoto 
similarity, and the GenericItemBasedRecommender, and then called 
recommender.mostSimilarItems() (a lot).  

Profiling indicated that the majority of the time was spent in 
GenericBooleanPrefDataModel.getNumUsersWithPreferenceFor(long... itemIDs).  The 
version in GenericDataModel is optimized for the cases of one and two itemIDs, 
but the version in GenericBooleanPrefDataModel always computes the intersection 
set.

I can create a patch which optimizes the two cases of itemIDs.length == 1 and 
itemIDs.length == 2 (similar to the version in GenericDataModel), but perhaps 
the code should be refactored if these are really the most common cases.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to