[
https://issues.apache.org/jira/browse/MAHOUT-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064482#comment-13064482
]
Han Hui Wen edited comment on MAHOUT-759 at 7/14/11 2:27 PM:
--------------------------------------------------------------
In
http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/MostSimilarItemPairsMapper.java?view=markup
69 long itemID = indexItemIDMap.get(itemIDIndex);
70 for (SimilarItem similarItem : topKMostSimilarItems.retrieve()) {
71 long otherItemID = similarItem.getItemID();
72 if (itemID < otherItemID) {
73 ctx.write(new EntityEntityWritable(itemID, otherItemID), new
DoubleWritable(similarItem.getSimilarity()));
74 } else {
75 ctx.write(new EntityEntityWritable(otherItemID, itemID), new
DoubleWritable(similarItem.getSimilarity()));
76 }
77 }
because here only get the similar items sequentially that is grate than the
item's itemId.
So if we need get all item's similar items (both the similar items that are
great than the item and
the similar items that are less than the item ) ,we have to hold them in the
memory ,if here has
huge data ,it need big memory .
was (Author: huiwenhan):
In
http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/MostSimilarItemPairsMapper.java?view=markup
69 long itemID = indexItemIDMap.get(itemIDIndex);
70 for (SimilarItem similarItem : topKMostSimilarItems.retrieve()) {
71 long otherItemID = similarItem.getItemID();
72 if (itemID < otherItemID) {
73 ctx.write(new EntityEntityWritable(itemID, otherItemID), new
DoubleWritable(similarItem.getSimilarity()));
74 } else {
75 ctx.write(new EntityEntityWritable(otherItemID, itemID), new
DoubleWritable(similarItem.getSimilarity()));
76 }
77 }
because here only get the similar items sequentially that is grate than the
item's itemId.
So if here has huge data ,it need big memory .
> improve the output for ItemSimilarityJob
> ----------------------------------------
>
> Key: MAHOUT-759
> URL: https://issues.apache.org/jira/browse/MAHOUT-759
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.5
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Priority: Minor
> Labels: ItemSimilarityJob,, Mahout
> Fix For: 0.6
>
>
> Now the output of ItemSimilarityJob like following:
> -7757148334301255842 8179634876330318523 0.003430531732418525
> -7748456450926673883 -4835531939219667484 0.2
> -7748456450926673883 -4314955996498817413 0.5
> -7748456450926673883 2808714190706572296 0.16666666666666666
> -7748456450926673883 6553837338030757853 0.14285714285714285
> -7748456450926673883 8751415108300656176 0.25
> -7747582778903926086 -7015341798833970389 0.05
> -7745456649800833279 -4355275072474512298 4.2444821731748726E-4
> -7743453627722079138 -3667977661496669483 0.0625
> -7743453627722079138 5506208171850960507 0.0625
> -7743453627722079138 7221367701058721462 0.0625
> -7721326863046534787 4345458182369739840 0.1111111111111111
> It's hard to store and view those similar items for one item. can we traverse
> them same as RecommenderJob like following:
> -9220680374247203656
> [1352180348488328600:2.5,-7757148334301255842:2.5,-7490490145790861630:2.5,-2522983126042570313:2.5,-6799281597153282746:2.5,2068144185705723774:2.5,-6007350693723349387:2.5,-6926986971196173463:2.5,5406899818760113425:2.5,-1490410533166829581:2.5,-27094582027403342:2.5,5665136340246000627:2.5]
> -9218599019595753787
> [7535853797920985421:2.5,6375444791143058470:2.5,-6278686364859964742:2.5,4842183991621375854:2.5,-5371123101058190798:2.5,8606934083257321678:2.5,8043580185091202137:2.5,5264973095582397115:2.5,1990532764981555035:2.5,5406899818760113425:2.5,-5208048021997301514:2.5,-5565838412826072017:2.5]
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira