On Fri, Jul 10, 2009 at 10:03 AM, Thomas Rewig<[email protected]> wrote: > Question 1: > The similarity-matrix uses 400MB memory at the MySQLDB - by > setting the ItemCorrelation 8GB Ram will be used to load the > similarity-matrix as a GenericItemSimilarity. Is it > possible/plausible that this matix uses more than 20 times more > memory in RAM then in the Database - or have I do something wrong ?
I could believe this. 100,000 items means about 5,000,000,000 item-item pairs are possible. Many are not kept, but seeing as each once requires 30 or so bytes of memory, I am not surprised that it could take 8GB. That's really a lot to keep in memory. I might suggest, instead, that you not pre-compute the similarities, but instead compute them as needed and cache (use CachingItemSimilarity). That way you are not spending so much memory on pairs that may never get used, but still get much of the speed improvement. > Question 2: > How can I reduce the memory consumption from the > GenericItemSimilarity? - |*GenericItemSimilarity > > <http://lucene.apache.org/mahout/javadoc/core/org/apache/mahout/cf/taste/impl/similarity/GenericItemSimilarity.html#GenericItemSimilarity%28java.lang.Iterable,%20int%29>*(Iterable > > <http://java.sun.com/javase/6/docs/api/java/lang/Iterable.html?is-external=true><GenericItemSimilarity.ItemItemSimilarity > > <http://lucene.apache.org/mahout/javadoc/core/org/apache/mahout/cf/taste/impl/similarity/GenericItemSimilarity.ItemItemSimilarity.html>> > similarities, > int maxToKeep)| > does't work, because if maxToKeep is too small, the > recommendations will be bad ... Yeah you are already filtering out many of the less important correlations anyway. You could filter yet more, to reduce memory requirements, but I think it's just best to not try to store all of this in memory. It doesn't scale well. > 2. Speed of Recommendation: I use a MySQLJDBCDataModel - MyISAM. > Primary Key and Indexes are set: > PRIMARY KEY (user_id, item_id),INDEX (user_id),INDEX (item_id). A > Recommendation for a User takes between 0,5 and 80 seconds - I > would like if this takes just 300ms. > > By the way I use a Quadcore 3,2 GHz with 32G-RAM to compute the > recommendations, so maybe the DB is the Bottleneck. But if I use a > FileDataModel it is faster, but not really much. > > Heres a log for a User with 2000 belonging Items: > > INFO CollaborativeModel - Seconds to set ItemCorrelation: 76.187506 s > INFO CollaborativeModel - Seconds to set Recommender: > 0.025945000000000003 s > INFO CollaborativeModel - Seconds to set CachingRecommender: 0.06511 s > INFO CollaborativeController - SECONDS TO REFRESH THE SYSTEM: > 6.450000000000001E-4 s > INFO root - SECONDS TO GET A RECOMMENDATION FOR USER: 50.888347 s > > Question: > Is there a way to increase the speed of a recommendation? (use > InnoDB?, compute less Items ... someway ;-)...?) Your indexes are right. Are you using a connection pool? that is really important. How many users do you have? if you have relatively few users, you might use a user-based recommender instead. Or, consider a slope-one recommender. It sounds like you have a lot of items, so the way item-based recommenders work, it will be slow. Using CachingItemSimilarity could help. I am surprised that a FileDataModel isn't much faster, since it loads data in memory. That suggests to me that the database isn't the bottleneck. Are you using multiple threads to compute recommendations simultaneously? you certainly can, to take advantage of the 4 cores.
