[
https://issues.apache.org/jira/browse/MAHOUT-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977769#action_12977769
]
Renaud Bruyeron commented on MAHOUT-576:
----------------------------------------
I am looking for a way to fix this, and it seems that something is missing in
the DiffStorage API:
{code}
/**
* <p>
* Updates internal data structures to reflect an update in a preference
value for an item.
* </p>
*
* @param itemID
* item to update preference value for
* @param prefDelta
* amount by which preference value changed (or its old value, if
being removed
* @param remove
* if <code>true</code>, operation reflects a removal rather than
change of preference
*/
void updateItemPref(long itemID, float prefDelta, boolean remove) throws
TasteException;
{code}
this works when we have a true update (i.e. neither a removal nor a *new*
preference).
However in the case of a removal or a new preference, this method is not
enough: the implementations actually need to have the delta with the peers and
not just the value being removed.
i.e.:
if user X removes preference Pa on item A, we need Pb of all items B that
impacted by this removal (because in fine we need Pb-Pa in the calculation, and
not just Pa)
I suspect an API change is needed here: split the method in 2 like this:
{code}
// this must be used only for true update, will throw TE if used on a removal
or insertion
void updateItemPref(long itemID, float prefDelta) throws TasteException;
// this must be used for removal or insertion
void removeItemPref(long itemID, long userID, float pref, boolean removal)
throws TasteException;
{code}
userID is needed to efficiently get at the peer preferences and compute deltas,
I reckon. What do you think?
> AbstractJDBCDiffStorage.updateItemPref is updating the AVG incorrectly in
> most cases
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-576
> URL: https://issues.apache.org/jira/browse/MAHOUT-576
> Project: Mahout
> Issue Type: Bug
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Renaud Bruyeron
> Fix For: 0.5
>
>
> the JDBC version of the DiffStorage is not using a RunningAverage in the
> removePreference case, and ends up making incorrect calculations.
> In a scenario where users are setting and removing a lot of preferences, the
> AVG stored in the diff table quickly diverges from the correct value because
> of this.
> Right now, the input to updateItemPref comes from SlopeOneRecommender, and in
> the case of removePreference, it is *the old preference* value, not a delta.
> However, the code uses it as if it were a delta. Thus the calculation is off
> by PEER(removedpreference,userid)/count everytime a user removes a preference.
> At first glance, the code should compute the old delta instead of the old
> preference, and use this in the updateItemPref
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.