[jira] [Issue Comment Edited] (MAHOUT-759) improve the output for ItemSimilarityJob

Han Hui Wen (JIRA) Thu, 14 Jul 2011 07:29:28 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064482#comment-13064482
 ]


Han Hui Wen  edited comment on MAHOUT-759 at 7/14/11 2:27 PM:
--------------------------------------------------------------

In 
http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/MostSimilarItemPairsMapper.java?view=markup

69  long itemID = indexItemIDMap.get(itemIDIndex);
70      for (SimilarItem similarItem : topKMostSimilarItems.retrieve()) {
71      long otherItemID = similarItem.getItemID();
72      if (itemID < otherItemID) {
73      ctx.write(new EntityEntityWritable(itemID, otherItemID), new 
DoubleWritable(similarItem.getSimilarity()));
74      } else {
75      ctx.write(new EntityEntityWritable(otherItemID, itemID), new 
DoubleWritable(similarItem.getSimilarity()));
76      }
77      }


because here only get the similar items sequentially that is grate  than the 
item's itemId.

So if we need get all item's similar items (both the similar items that are 
great than the item and 
the similar items that are less  than the item ) ,we have to hold them in the 
memory ,if here has 
huge data ,it need big memory .

      was (Author: huiwenhan):
    In 
http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/MostSimilarItemPairsMapper.java?view=markup

69  long itemID = indexItemIDMap.get(itemIDIndex);
70      for (SimilarItem similarItem : topKMostSimilarItems.retrieve()) {
71      long otherItemID = similarItem.getItemID();
72      if (itemID < otherItemID) {
73      ctx.write(new EntityEntityWritable(itemID, otherItemID), new 
DoubleWritable(similarItem.getSimilarity()));
74      } else {
75      ctx.write(new EntityEntityWritable(otherItemID, itemID), new 
DoubleWritable(similarItem.getSimilarity()));
76      }
77      }


because here only get the similar items sequentially that is grate  than the 
item's itemId.

So if here has huge data ,it need big memory .
  
> improve the output for ItemSimilarityJob
> ----------------------------------------
>
>                 Key: MAHOUT-759
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-759
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>            Priority: Minor
>              Labels: ItemSimilarityJob,, Mahout
>             Fix For: 0.6
>
>
> Now the output of ItemSimilarityJob like following:
> -7757148334301255842  8179634876330318523     0.003430531732418525
> -7748456450926673883  -4835531939219667484    0.2
> -7748456450926673883  -4314955996498817413    0.5
> -7748456450926673883  2808714190706572296     0.16666666666666666
> -7748456450926673883  6553837338030757853     0.14285714285714285
> -7748456450926673883  8751415108300656176     0.25
> -7747582778903926086  -7015341798833970389    0.05
> -7745456649800833279  -4355275072474512298    4.2444821731748726E-4
> -7743453627722079138  -3667977661496669483    0.0625
> -7743453627722079138  5506208171850960507     0.0625
> -7743453627722079138  7221367701058721462     0.0625
> -7721326863046534787  4345458182369739840     0.1111111111111111
> It's hard to store and view those similar items for one item. can we traverse 
>   them same as RecommenderJob like following:
> -9220680374247203656  
> [1352180348488328600:2.5,-7757148334301255842:2.5,-7490490145790861630:2.5,-2522983126042570313:2.5,-6799281597153282746:2.5,2068144185705723774:2.5,-6007350693723349387:2.5,-6926986971196173463:2.5,5406899818760113425:2.5,-1490410533166829581:2.5,-27094582027403342:2.5,5665136340246000627:2.5]
> -9218599019595753787  
> [7535853797920985421:2.5,6375444791143058470:2.5,-6278686364859964742:2.5,4842183991621375854:2.5,-5371123101058190798:2.5,8606934083257321678:2.5,8043580185091202137:2.5,5264973095582397115:2.5,1990532764981555035:2.5,5406899818760113425:2.5,-5208048021997301514:2.5,-5565838412826072017:2.5]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-759) improve the output for ItemSimilarityJob

Reply via email to