Re: Memory and Speed Questions for Item-Based-Recommender

Thomas Rewig Fri, 10 Jul 2009 05:19:28 -0700

Thank you for your fast reply!

Sean Owen:

On Fri, Jul 10, 2009 at 10:03 AM, Thomas Rewig<[email protected]> wrote:

    Question 1:
    The similarity-matrix uses 400MB memory at the MySQLDB - by
    setting the ItemCorrelation 8GB Ram will be used to load the
    similarity-matrix as a GenericItemSimilarity. Is it
    possible/plausible that this matix uses more than 20 times more
    memory in RAM then in the Database - or have I do something wrong ?


I could believe this. 100,000 items means about 5,000,000,000
item-item pairs are possible. Many are not kept, but seeing as each
once requires 30 or so bytes of memory, I am not surprised that it
could take 8GB.

That's really a lot to keep in memory. I might suggest, instead, that
you not pre-compute the similarities, but instead compute them as
needed and cache (use CachingItemSimilarity). That way you are not
spending so much memory on pairs that may never get used, but still
get much of the speed improvement.

In the Moment to get the similarity-matrix I do that:

   * create a DataModel (MySQLDB) in this Way: aItem,
     aItemCharacteristic, aItemValue  (each aItem have 40
     aItemCharacteristics later there will be more)
   * set a UserSimilarity - Pearson or Euclidian
   * get in a multithreaded way all similarities: aCorrelation =
     aUserSimilarity.userSimilarity(user1, user2); - this is stressful
     for cpu, but in 4 hours it is done - not bad for n!/(n-2)!
     combinations ;-)
   * save them if they correlate more than 0.95
   * get it in the GenericItemSimilarity to use it in a
     ItemBasedRecommender

Ok I will test with the Casching(?)Similarity. If I understand youright, this will mean I


   * create a DataModel_1 (MySQLDB) in this Way: aItem,
     aItemCharacteristic, aItemValue (each aItem have 40
     aItemCharacteristics)
   * create a UserSimilarity so that I have the similarity of the
     aItems (if I use ItemSimilarity I would get the similarity of the
     aItemCharacteristic ... right?)
   * create a CachingUserSimilarity and put DataModel_1 and the
     UserSimilarity in there
   * create a DataModel_2 (MySQLDB) in this Way:
     aUser,aItem,aItemPreference
   * create the Neighborhood
   * create a UserBasedRecommender and put the Neighborhood, the
     DataModel_2 and the CachingUserSimilarity in there
   * create a CachingRecommender
   * et voilà :-) I have a working memory sparing recommender

But I can't do that with a Itembased-Recommender because I have noItemCorrelation (because theSimilarity of aItemCharacteristic doesn'tmatter ), is that right? So the sentence in the docu: "So, item-basedrecommenders can use pre-computed similarity values in the computations,which make them much faster. For large data sets, item-basedrecommenders are more appropriate" doesn't work for me. Or

In the moment I have a Testset of 500000 Users and 100000 Items. TheItem-Similarity is computed with taste, but with external data.


Sean Owen:


  Question:
  Is there a way to increase the speed of a recommendation? (use
  InnoDB?, compute less Items ... someway ;-)...?)


Your indexes are right. Are you using a connection pool? that is
really important.

Yes I do use a connection pool:

       this.cPoolDS = new ConnectionPoolDataSource(dataSource);
       this.aConnection = cPoolDS.getConnection();

Sean Owen:

How many users do you have? if you have relatively few users, you
might use a user-based recommender instead. Or, consider a slope-one
recommender.

In the Moment there are 5 times more users than items - later this couldchange to 1.5 Mio Items and 150,000 users but first my tests must work.I testet the slope-one recommender as taste wasn't in mahout and Ifound, that the recommendations don't work for me. Has there somethingchanged? ... maybe I should give it another try.


Sean Owen:

It sounds like you have a lot of items, so the way item-based
recommenders work, it will be slow.

Using CachingItemSimilarity could help. I am surprised that a
FileDataModel isn't much faster, since it loads data in memory. That
suggests to me that the database isn't the bottleneck.

Are you using multiple threads to compute recommendations
simultaneously? you certainly can, to take advantage of the 4 cores.

Yes I do, but every .recommend command is taste intern only a singlethread. Is that right?


best regards
Thomas
--
___________________________________________________________
Thomas Rewig
___________________________________________________________

Re: Memory and Speed Questions for Item-Based-Recommender

Reply via email to