[
https://issues.apache.org/jira/browse/MAHOUT-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195576#comment-13195576
]
Bryce Nyeggen commented on MAHOUT-963:
--------------------------------------
I discovered this on a dataset of several million user-item pairs, with <20
items / user on average, but probably several hundred thousand users associated
with the most common items. In those cases, it brings it from several hours to
construct the data model to a couple minutes, with nearly all of the gain
associated with faster sorts on the GenericItemPreferenceArrays.
As a side note, in the time I spent looking at the code, I didn't see any
calling code that depends on the GenericItemPreferenceArrays being sorted by
user - is it currently necessary for them to be sorted at all, or is it just to
allow future optimizations?
> GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts
> -----------------------------------------------------------------------------
>
> Key: MAHOUT-963
> URL: https://issues.apache.org/jira/browse/MAHOUT-963
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Bryce Nyeggen
> Assignee: Sean Owen
> Priority: Minor
> Attachments: MAHOUT-963.diff
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Both PreferenceArray implementations use selection sorts with poor
> performance. These sorts are invoked during construction of
> GenericDataModels, causing excessive construction time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira