[
https://issues.apache.org/jira/browse/MAHOUT-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peng Cheng updated MAHOUT-1286:
-------------------------------
Attachment: Semifinal-implementation-added.patch
Sorry about the late reply, and please be noted that the code can still be
optimized at many places, I'll keep maintain it and keep an ear on all
suggestions.
> Memory-efficient DataModel, supporting fast online updates and element-wise
> iteration
> -------------------------------------------------------------------------------------
>
> Key: MAHOUT-1286
> URL: https://issues.apache.org/jira/browse/MAHOUT-1286
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.9
> Reporter: Peng Cheng
> Labels: collaborative-filtering, datamodel, patch, recommender
> Fix For: 0.9
>
> Attachments: InMemoryDataModel.java, InMemoryDataModelTest.java,
> Semifinal-implementation-added.patch
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> Most DataModel implementation in current CF component use hash map to enable
> fast 2d indexing and update. This is not memory-efficient for big data set.
> e.g. Netflix prize dataset takes 11G heap space as a FileDataModel.
> Improved implementation of DataModel should use more compact data structure
> (like arrays), this can trade a little of time complexity in 2d indexing for
> vast improvement in memory efficiency. In addition, any online recommender or
> online-to-batch converted recommender will not be affected by this in
> training process.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira