On Fri, Jul 10, 2009 at 1:18 PM, Thomas Rewig<[email protected]> wrote: > Ok I will test with the Casching(?)Similarity. If I understand you right, > this will mean I > > * create a DataModel_1 (MySQLDB) in this Way: aItem, > aItemCharacteristic, aItemValue (each aItem have 40 > aItemCharacteristics) > * create a UserSimilarity so that I have the similarity of the > aItems (if I use ItemSimilarity I would get the similarity of the > aItemCharacteristic ... right?) > * create a CachingUserSimilarity and put DataModel_1 and the > UserSimilarity in there > * create a DataModel_2 (MySQLDB) in this Way: > aUser,aItem,aItemPreference > * create the Neighborhood > * create a UserBasedRecommender and put the Neighborhood, the > DataModel_2 and the CachingUserSimilarity in there > * create a CachingRecommender > * et voilà :-) I have a working memory sparing recommender > > But I can't do that with a Itembased-Recommender because I have no > ItemCorrelation (because theSimilarity of aItemCharacteristic doesn't matter > ), is that right? So the sentence in the docu: "So, item-based recommenders > can use pre-computed similarity values in the computations, which make them > much faster. For large data sets, item-based recommenders are more > appropriate" doesn't work for me. Or
Yes all that is true. Precomputing is reasonable -- it's the storing it in memory that is difficult given the size. You could consider keeping the similarities in the database instead, and not loading into memory, if you are worried about memory. There is not an implementation that reads a database table but we could construct one. I don't see how UserSimilarity objects come into this. You would not use one in an item-based recommender. There is a CachingItemSimilarity for ItemSimilarity classes. What you are doing now is effectively pre-computing all similarities and caching them in memory, all of them, ahead of time. Using CachingItemSimilarity would simply do that for you, and would probably use a lot less memory since only pairs that are needed, and accessed frequently, will be put into memory. It won't be quite as fast, since it will still be re-computing similarities from time to time. But overall you will probably use far less memory for a small decrease in performance. Beyond that I could suggest more extreme modifications to the code. For example, if you are willing to dig into the code to experiment, you can try something like this: instead of considering every single item for recommendation every time, pre-compute some subset of items that are reasonably popular, and then in the code, only consider recommending these. It is not a great approach since you want to recommend obscure items sometimes, but could help. You should also try using the very latest code from subversion. Just this week I have made some pretty good improvements to the JDBC code. Also, it sounds like you are trying to do real-time recommendations, like synchronously with a user request. This can be hard since it imposes such a tight time limit. Consider doing recommendations asynchronously if you can. For example, start computing recommendations when the user logs in, and maybe on the 2nd page view 5 seconds later, you are ready to recommend something. > Yes I do, but every .recommend command is taste intern only a single thread. > Is that right? Yes internally there is no multi-threading. You would do it externally.
