CodyInnowhere created MAHOUT-1032:
-------------------------------------
Summary: AggregateAndRecommendReducer gets OOM in setup() method
Key: MAHOUT-1032
URL: https://issues.apache.org/jira/browse/MAHOUT-1032
Project: Mahout
Issue Type: Bug
Components: Collaborative Filtering
Affects Versions: 0.6, 0.5, 0.7, 0.8
Environment: hadoop cluster with -Xmx set to 2G
Reporter: CodyInnowhere
Assignee: Sean Owen
This bug is actually caused by the very first job: itemIDIndex. This job
transfers itemID to an integer index, and in the later
AggregateAndRecommendReducer, tries to read all items to the OpenIntLongHashMap
indexItemIDMap. However, for large data sets, e.g., my test data set covers
100million+ items(not too many items for a large e-commerce website), tasks get
out of memory in setup() method. I don't think the itemIDIndex is necessary,
without this job, the final AggregateAndRecommend step doesn't have to read all
items to the memory to do the reverse index mapping.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira