[
https://issues.apache.org/jira/browse/MAHOUT-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated MAHOUT-960:
-----------------------------
Fix Version/s: (was: 0.6)
Assignee: Sebastian Schelter (was: Sean Owen)
> Reduce memory usage of ImplicitFeedbackAlternatingLeastSquaresSolver
> --------------------------------------------------------------------
>
> Key: MAHOUT-960
> URL: https://issues.apache.org/jira/browse/MAHOUT-960
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Doug Mittendorf
> Assignee: Sebastian Schelter
> Priority: Minor
> Attachments: MAHOUT-960.patch
>
>
> One of the main limiting factors of the implicit ALS algorithm when
> processing large datasets is the fact that it must fit the entire U or M
> matrix in memory. This is further compounded by the fact that the current
> implementation represents the matrix in memory 3 times:
> 1. As an OpenIntObjectHashMap read in from disk
> 2. A sorted DenseMatrix representation of #1 to prepare for computing Y'Y
> 3. The transpose of #2 (another DenseMatrix)
> The #3 copy of the matrix can be eliminated by computing Y'Y directly from Y
> without first computing the transpose of Y as an intermediate step. This
> should also be more efficient in terms of CPU usage.
> Note that the #1 copy of the matrix could also be eliminated if it's assumed
> that the user and item IDs are sequentially assigned and ordered. This would
> allow the DenseMatrix to be populated directly from disk instead of reading
> into an intermediate OpenIntObjectHashMap.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira