[ 
https://issues.apache.org/jira/browse/MAHOUT-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399333#comment-13399333
 ] 

Doug Mittendorf commented on MAHOUT-960:
----------------------------------------

Thanks for committing this change! Good call on computing Y'Y directly from the 
OpenIntObjectHashMap.  That pretty much eliminates the need for the #3 
optimization since you've already reduced to a single copy of the matrix being 
held in memory.
                
> Reduce memory usage of ImplicitFeedbackAlternatingLeastSquaresSolver
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-960
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-960
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Doug Mittendorf
>            Assignee: Sebastian Schelter
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: MAHOUT-960-1.patch, MAHOUT-960.patch
>
>
> One of the main limiting factors of the implicit ALS algorithm when 
> processing large datasets is the fact that it must fit the entire U or M 
> matrix in memory.  This is further compounded by the fact that the current 
> implementation represents the matrix in memory 3 times:
> 1. As an OpenIntObjectHashMap read in from disk
> 2. A sorted DenseMatrix representation of #1 to prepare for computing Y'Y
> 3. The transpose of #2 (another DenseMatrix)
> The #3 copy of the matrix can be eliminated by computing Y'Y directly from Y 
> without first computing the transpose of Y as an intermediate step.  This 
> should also be more efficient in terms of CPU usage.
> Note that the #1 copy of the matrix could also be eliminated if it's assumed 
> that the user and item IDs are sequentially assigned and ordered.  This would 
> allow the DenseMatrix to be populated directly from disk instead of reading 
> into an intermediate OpenIntObjectHashMap.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to