[ 
https://issues.apache.org/jira/browse/MAHOUT-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584967#comment-14584967
 ] 

lariven commented on MAHOUT-1739:
---------------------------------

hi, Sebastian:
 I found that it cannot pass it's origin test case! Just print the output file, 
it isn't what we expected:
  *    When we set maxSimilaritiesPerItem to 1 the following pairs should be 
found:
   *
   *    i1 --> i2
   *    i2 --> i1
   *    i3 --> i1

BUT OUTPUT:
1   2   0.5
1   3   0.4

It seems outputting a triangular matrix which is not we want.


> maxSimilarItemsPerItem param of ItemSimilarityJob doesn't behave correct
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-1739
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1739
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.10.0
>            Reporter: lariven
>              Labels: easyfix, patch
>         Attachments: fix_maxSimilarItemsPerItem_incorrect_behave.patch
>
>
> the output similar items of ItemSimilarityJob for each target item may exceed 
> the number of similar items we set to maxSimilarItemsPerItem  parameter. the 
> following code of ItemSimilarityJob.java about line NO. 200 may affect:
>         if (itemID < otherItemID) {
>           ctx.write(new EntityEntityWritable(itemID, otherItemID), new 
> DoubleWritable(similarItem.getSimilarity()));
>         } else {
>           ctx.write(new EntityEntityWritable(otherItemID, itemID), new 
> DoubleWritable(similarItem.getSimilarity()));
>         }
> Don't know why need to switch itemID with otherItemID, but I think a single 
> line is enough:
>           ctx.write(new EntityEntityWritable(itemID, otherItemID), new 
> DoubleWritable(similarItem.getSimilarity()));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to