Hello Taste-Community,

since a few weeks I tested with mahout-taste (Release Apache Mahout 0.1) - and I like it :-)!

I have created a working Item-Based-Recommender and now I have some questions about speed and memory
... maybe you can give me a hint what I have to improve.

  1. ItemCorrelation: I precompute all correlations for approximately
     100000 items and save them in a MySqlDataBase if they correlate
     more than e.g. 0.95 . Then I get the correlation in the
     recommender in that way:

         //use the _precomputed_ ItemItemCorrelation

         String[] splittArray = *null*;
         String strLine = *null*;

         ItemItemSimilarity aItemItemCorrelation = *null*;
         Collection<GenericItemSimilarity.ItemItemSimilarity>
         correlationMatrix =
         *new* ArrayList<GenericItemSimilarity.ItemItemSimilarity>();

         // open File:

         BufferedReader inStream = *new* BufferedReader(*new*
         FileReader(filePath));

         *while*((strLine = inStream.readLine()) != *null*)
         {

             splittArray = strLine.split(",");

             Item aItem1 = *new* GenericItem<String>(splittArray[0]);
             Item aItem2 = *new* GenericItem<String>(splittArray[1]);

             aItemItemCorrelation = *new
             *GenericItemSimilarity.ItemItemSimilarity(aItem1, aItem2,
             Double./parseDouble/(splittArray[2]) );
             correlationMatrix.add(aItemItemCorrelation);

         }
         …
         // set the ItemSimilarity:

         * **this*.itemSimilarity = *new*
         GenericItemSimilarity(correlationMatrix);
         …
         // set Recommender:

         recommender = *new*
         GenericItemBasedRecommender(*super*.getModel(), itemSimilarity);
         …
         // set CachingRecommender:
         * this*.cachingRecommender = *new*
         CachingRecommender(recommender);

     Question 1:
     The similarity-matrix uses 400MB memory at the MySQLDB - by
     setting the ItemCorrelation 8GB Ram will be used to load the
     similarity-matrix as a GenericItemSimilarity. Is it
     possible/plausible that this matix uses more than 20 times more
     memory in RAM then in the Database - or have I do something wrong ?

     Question 2:
     How can I reduce the memory consumption from the
     GenericItemSimilarity? - |*GenericItemSimilarity
     
<http://lucene.apache.org/mahout/javadoc/core/org/apache/mahout/cf/taste/impl/similarity/GenericItemSimilarity.html#GenericItemSimilarity%28java.lang.Iterable,%20int%29>*(Iterable
     
<http://java.sun.com/javase/6/docs/api/java/lang/Iterable.html?is-external=true><GenericItemSimilarity.ItemItemSimilarity
     
<http://lucene.apache.org/mahout/javadoc/core/org/apache/mahout/cf/taste/impl/similarity/GenericItemSimilarity.ItemItemSimilarity.html>>
 similarities,
     int maxToKeep)|
     does't work, because if maxToKeep is too small, the
     recommendations will be bad ...


  2. Speed of Recommendation: I use a MySQLJDBCDataModel - MyISAM.
     Primary Key and Indexes are set:
     PRIMARY KEY (user_id, item_id),INDEX (user_id),INDEX (item_id). A
     Recommendation for a User takes between 0,5 and 80 seconds - I
     would like if this takes just 300ms.

   By the way I use a Quadcore 3,2 GHz with 32G-RAM to compute the
   recommendations, so maybe the DB is the Bottleneck. But if I use a
   FileDataModel it is faster, but not really much.

   Heres a log for a User with 2000 belonging Items:

   INFO  CollaborativeModel - Seconds to set ItemCorrelation: 76.187506 s
   INFO  CollaborativeModel - Seconds to set Recommender:
   0.025945000000000003 s
   INFO  CollaborativeModel - Seconds to set CachingRecommender: 0.06511 s
   INFO  CollaborativeController - SECONDS TO REFRESH THE SYSTEM:
   6.450000000000001E-4 s
   INFO  root - SECONDS TO GET A RECOMMENDATION FOR USER: 50.888347 s

   Question:
   Is there a way to increase the speed of a recommendation? (use
   InnoDB?, compute less Items ... someway ;-)...?)

So if you have some idea how I could reduce the memory consumption and increase the recommendation speed I would be very thankfully.

best regards
Thomas

--
___________________________________________________________
Thomas Rewig
___________________________________________________________

Reply via email to