[
https://issues.apache.org/jira/browse/MAHOUT-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584976#comment-14584976
]
lariven edited comment on MAHOUT-1739 at 6/14/15 8:03 AM:
----------------------------------------------------------
1, From the view point of usage, it make sense that an item inputted then 10
most similar items output. But the triangular matrix can't satisfy the "Who Buy
X also Buy Y" recommandation because it is not contain all items as Keys. So
what is the usage of this Job?
2, This job take it's input from RowSimilarityJob, whose maxSimilaritiesPerRow
param make sense and really do what we want. It's strange this two param take
the same value but behave differently.
was (Author: lariven):
1, From the view point of usage, it make sense that an item inputted then 10
most similar items output. But the triangular matrix can't satisfy the "Who Buy
X also Buy Y" recommandation because it is not contain all items as Keys. So
what is the usage of this Job?
2, This job take it's input from RowSimilarityJob, whose maxObservationsPerRow
param make sense and really do what we want. It's strange this two param take
the same value but behave differently.
> maxSimilarItemsPerItem param of ItemSimilarityJob doesn't behave correct
> ------------------------------------------------------------------------
>
> Key: MAHOUT-1739
> URL: https://issues.apache.org/jira/browse/MAHOUT-1739
> Project: Mahout
> Issue Type: Bug
> Components: Collaborative Filtering
> Affects Versions: 0.10.0
> Reporter: lariven
> Labels: easyfix, patch
> Attachments: fix_maxSimilarItemsPerItem_incorrect_behave.patch
>
>
> the output similar items of ItemSimilarityJob for each target item may exceed
> the number of similar items we set to maxSimilarItemsPerItem parameter. the
> following code of ItemSimilarityJob.java about line NO. 200 may affect:
> if (itemID < otherItemID) {
> ctx.write(new EntityEntityWritable(itemID, otherItemID), new
> DoubleWritable(similarItem.getSimilarity()));
> } else {
> ctx.write(new EntityEntityWritable(otherItemID, itemID), new
> DoubleWritable(similarItem.getSimilarity()));
> }
> Don't know why need to switch itemID with otherItemID, but I think a single
> line is enough:
> ctx.write(new EntityEntityWritable(itemID, otherItemID), new
> DoubleWritable(similarItem.getSimilarity()));
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)