[
https://issues.apache.org/jira/browse/MAHOUT-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399333#comment-13399333
]
Doug Mittendorf commented on MAHOUT-960:
----------------------------------------
Thanks for committing this change! Good call on computing Y'Y directly from the
OpenIntObjectHashMap. That pretty much eliminates the need for the #3
optimization since you've already reduced to a single copy of the matrix being
held in memory.
> Reduce memory usage of ImplicitFeedbackAlternatingLeastSquaresSolver
> --------------------------------------------------------------------
>
> Key: MAHOUT-960
> URL: https://issues.apache.org/jira/browse/MAHOUT-960
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Doug Mittendorf
> Assignee: Sebastian Schelter
> Priority: Minor
> Fix For: 0.8
>
> Attachments: MAHOUT-960-1.patch, MAHOUT-960.patch
>
>
> One of the main limiting factors of the implicit ALS algorithm when
> processing large datasets is the fact that it must fit the entire U or M
> matrix in memory. This is further compounded by the fact that the current
> implementation represents the matrix in memory 3 times:
> 1. As an OpenIntObjectHashMap read in from disk
> 2. A sorted DenseMatrix representation of #1 to prepare for computing Y'Y
> 3. The transpose of #2 (another DenseMatrix)
> The #3 copy of the matrix can be eliminated by computing Y'Y directly from Y
> without first computing the transpose of Y as an intermediate step. This
> should also be more efficient in terms of CPU usage.
> Note that the #1 copy of the matrix could also be eliminated if it's assumed
> that the user and item IDs are sequentially assigned and ordered. This would
> allow the DenseMatrix to be populated directly from disk instead of reading
> into an intermediate OpenIntObjectHashMap.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira