[
https://issues.apache.org/jira/browse/MAHOUT-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated MAHOUT-423:
-----------------------------
Assignee: Sean Owen
Fix Version/s: 0.4
Priority: Minor (was: Major)
> Optimize getNumUsersWithPreferenceFor(long... itemIDs)
> ------------------------------------------------------
>
> Key: MAHOUT-423
> URL: https://issues.apache.org/jira/browse/MAHOUT-423
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.3
> Reporter: Jonathan Young
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-423.patch
>
>
> I ran a simple collaborative filtering application using a
> GenericBooleanPrefDataModel built from (a subset of) the Netflix data,
> Tanimoto similarity, and the GenericItemBasedRecommender, and then called
> recommender.mostSimilarItems() (a lot).
> Profiling indicated that the majority of the time was spent in
> GenericBooleanPrefDataModel.getNumUsersWithPreferenceFor(long... itemIDs).
> The version in GenericDataModel is optimized for the cases of one and two
> itemIDs, but the version in GenericBooleanPrefDataModel always computes the
> intersection set.
> I can create a patch which optimizes the two cases of itemIDs.length == 1 and
> itemIDs.length == 2 (similar to the version in GenericDataModel), but perhaps
> the code should be refactored if these are really the most common cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.