[
https://issues.apache.org/jira/browse/MAHOUT-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034705#comment-13034705
]
Chris Newell commented on MAHOUT-667:
-------------------------------------
Found a bug in AbstractFactorizer, which I introduced after failing to
understand how FastByIDMap behaves.
These two methods:
{code}
protected Integer userIndex(long userID) {
Integer userIndex = userIDMapping.get(userID);
if (userIndex == null) {
userIndex = userIDMapping.put(userID, userIDMapping.size());
}
return userIndex;
}
protected Integer itemIndex(long itemID) {
Integer itemIndex = itemIDMapping.get(itemID);
if (itemIndex == null) {
itemIndex = itemIDMapping.put(itemID, itemIDMapping.size());
}
return itemIndex;
}
{code}
Should be replaced by:
{code}
protected Integer getUserIndex(long userID) {
Integer userIndex = userIDMapping.get(userID);
if (userIndex == null) {
userIndex = userIDMapping.size();
userIDMapping.put(userID, userIndex);
}
return userIndex;
}
protected Integer getItemIndex(long itemID) {
Integer itemIndex = itemIDMapping.get(itemID);
if (itemIndex == null) {
itemIndex = itemIDMapping.size();
itemIDMapping.put(itemID, itemIndex);
}
return itemIndex;
}
{code}
> Persistent storage of factorizations in SVDRecommender
> ------------------------------------------------------
>
> Key: MAHOUT-667
> URL: https://issues.apache.org/jira/browse/MAHOUT-667
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.5
> Reporter: Chris Newell
> Assignee: Sebastian Schelter
> Priority: Minor
> Fix For: 0.5
>
> Attachments: persistent_svd.patch, persistent_svd_v2.patch
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> As discussed previously (https://issues.apache.org/jira/browse/MAHOUT-640) it
> would be beneficial to provide a persistent storage mechanism for
> factorizations created by SVDRecommender (in package
> org.apache.mahout.cf.taste.impl.recommender.svd) as these can be time
> consuming to produce. It would also allow factorizations to be computed on
> one machine then distributed to other machines providing predictions,
> improving efficiency and scalability.
> Having a "persistence strategy" interface has been suggested that could be
> implemented as required. I'll try to post a outline proposal for discussion
> purposes in the next few days but any comments or suggestions would be very
> welcome.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira