[ 
https://issues.apache.org/jira/browse/MAHOUT-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584976#comment-14584976
 ] 

lariven edited comment on MAHOUT-1739 at 6/14/15 8:03 AM:
----------------------------------------------------------

1, From the view point of usage, it make sense that an item inputted then 10 
most similar items output. But the triangular matrix can't satisfy the "Who Buy 
X also Buy Y" recommandation because it is not contain all items as Keys. So 
what is the usage of this Job?

2, This job take it's input from RowSimilarityJob, whose maxSimilaritiesPerRow 
param make sense and really do what we want. It's strange this two param take 
the same value but behave differently.



was (Author: lariven):
1, From the view point of usage, it make sense that an item inputted then 10 
most similar items output. But the triangular matrix can't satisfy the "Who Buy 
X also Buy Y" recommandation because it is not contain all items as Keys. So 
what is the usage of this Job?

2, This job take it's input from RowSimilarityJob, whose maxObservationsPerRow 
param make sense and really do what we want. It's strange this two param take 
the same value but behave differently.


> maxSimilarItemsPerItem param of ItemSimilarityJob doesn't behave correct
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-1739
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1739
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.10.0
>            Reporter: lariven
>              Labels: easyfix, patch
>         Attachments: fix_maxSimilarItemsPerItem_incorrect_behave.patch
>
>
> the output similar items of ItemSimilarityJob for each target item may exceed 
> the number of similar items we set to maxSimilarItemsPerItem  parameter. the 
> following code of ItemSimilarityJob.java about line NO. 200 may affect:
>         if (itemID < otherItemID) {
>           ctx.write(new EntityEntityWritable(itemID, otherItemID), new 
> DoubleWritable(similarItem.getSimilarity()));
>         } else {
>           ctx.write(new EntityEntityWritable(otherItemID, itemID), new 
> DoubleWritable(similarItem.getSimilarity()));
>         }
> Don't know why need to switch itemID with otherItemID, but I think a single 
> line is enough:
>           ctx.write(new EntityEntityWritable(itemID, otherItemID), new 
> DoubleWritable(similarItem.getSimilarity()));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to