[ 
https://issues.apache.org/jira/browse/MAHOUT-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195576#comment-13195576
 ] 

Bryce Nyeggen commented on MAHOUT-963:
--------------------------------------

I discovered this on a dataset of several million user-item pairs, with <20 
items / user on average, but probably several hundred thousand users associated 
with the most common items.  In those cases, it brings it from several hours to 
construct the data model to a couple minutes, with nearly all of the gain 
associated with faster sorts on the GenericItemPreferenceArrays.  

As a side note, in the time I spent looking at the code, I didn't see any 
calling code that depends on the GenericItemPreferenceArrays being sorted by 
user - is it currently necessary for them to be sorted at all, or is it just to 
allow future optimizations?
                
> GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts
> -----------------------------------------------------------------------------
>
>                 Key: MAHOUT-963
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-963
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Bryce Nyeggen
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-963.diff
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Both PreferenceArray implementations use selection sorts with poor 
> performance.  These sorts are invoked during construction of 
> GenericDataModels, causing excessive construction time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to