[
https://issues.apache.org/jira/browse/MAHOUT-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584970#comment-14584970
]
Sebastian Schelter commented on MAHOUT-1739:
--------------------------------------------
Actually, this is exactly what we want. All the similarity measures used in
Mahout are symmetric, so the upper triangular part of the similarity matrix
already contains all information.
I think I also know where this "bug" comes from. Its actually not a bug, but
the parameter maxSimilarItemsPerItem is not named very good.
Lets say maxSimilarItemsPerItem is 10. Now for an item A, we compute the 10
most similar items. There might be an item B for which A is in its 10 most
similar items, but B is not in the 10 most similar items of A. In order to
guarantee that we have 10 most similar items for B, we must output 11 similar
items for A unfortunately.
Does that make sense?
> maxSimilarItemsPerItem param of ItemSimilarityJob doesn't behave correct
> ------------------------------------------------------------------------
>
> Key: MAHOUT-1739
> URL: https://issues.apache.org/jira/browse/MAHOUT-1739
> Project: Mahout
> Issue Type: Bug
> Components: Collaborative Filtering
> Affects Versions: 0.10.0
> Reporter: lariven
> Labels: easyfix, patch
> Attachments: fix_maxSimilarItemsPerItem_incorrect_behave.patch
>
>
> the output similar items of ItemSimilarityJob for each target item may exceed
> the number of similar items we set to maxSimilarItemsPerItem parameter. the
> following code of ItemSimilarityJob.java about line NO. 200 may affect:
> if (itemID < otherItemID) {
> ctx.write(new EntityEntityWritable(itemID, otherItemID), new
> DoubleWritable(similarItem.getSimilarity()));
> } else {
> ctx.write(new EntityEntityWritable(otherItemID, itemID), new
> DoubleWritable(similarItem.getSimilarity()));
> }
> Don't know why need to switch itemID with otherItemID, but I think a single
> line is enough:
> ctx.write(new EntityEntityWritable(itemID, otherItemID), new
> DoubleWritable(similarItem.getSimilarity()));
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)