[ 
https://issues.apache.org/jira/browse/MAHOUT-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584970#comment-14584970
 ] 

Sebastian Schelter commented on MAHOUT-1739:
--------------------------------------------

Actually, this is exactly what we want. All the similarity measures used in 
Mahout are symmetric, so the upper triangular part of the similarity matrix 
already contains all information.

I think I also know where this "bug" comes from. Its actually not a bug, but 
the parameter maxSimilarItemsPerItem is not named very good.

Lets say maxSimilarItemsPerItem is 10. Now for an item A, we compute the 10 
most similar items. There might be an item B for which A is in its 10 most 
similar items, but B is not in the 10 most similar items of A. In order to 
guarantee that we have 10 most similar items for B, we must output 11 similar 
items for A unfortunately. 

Does that make sense?


> maxSimilarItemsPerItem param of ItemSimilarityJob doesn't behave correct
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-1739
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1739
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.10.0
>            Reporter: lariven
>              Labels: easyfix, patch
>         Attachments: fix_maxSimilarItemsPerItem_incorrect_behave.patch
>
>
> the output similar items of ItemSimilarityJob for each target item may exceed 
> the number of similar items we set to maxSimilarItemsPerItem  parameter. the 
> following code of ItemSimilarityJob.java about line NO. 200 may affect:
>         if (itemID < otherItemID) {
>           ctx.write(new EntityEntityWritable(itemID, otherItemID), new 
> DoubleWritable(similarItem.getSimilarity()));
>         } else {
>           ctx.write(new EntityEntityWritable(otherItemID, itemID), new 
> DoubleWritable(similarItem.getSimilarity()));
>         }
> Don't know why need to switch itemID with otherItemID, but I think a single 
> line is enough:
>           ctx.write(new EntityEntityWritable(itemID, otherItemID), new 
> DoubleWritable(similarItem.getSimilarity()));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to