On Wed, Apr 22, 2009 at 3:29 PM, Mirko Gontek <[email protected]> wrote: > Hi Sean, > when you say that the FileDataModel originally was intended to be read-only > I get the impression that I am on the wrong track. Maybe you could comment > on my thoughts, this would be great help..
FileDataModel was read-only in the sense that setPreference() and removePreference() did not work. FileDataModel would only change if the file it reads changed. But now that is not so -- you can call these methods to temporarily change the data in memory. This may make sense if you want to update your file *and* quickly update the in-memory representation without re-reading the file. (I still, maybe, wouldn't architect it this way and would just reload everything infrequently.) . > I would like to implement a GenericItemBasedRecommender, my testdata is a DB > with 300.000 Preferences (130.000 items, 12.000 users). Typically if you have many more items than users, you would prefer a user-based recommender, for performance. This is because a user-based recommender compares the user to all other users, and an item compares an item to all other items. But, an item-based recommender could be fast and appropriate if you have a very efficient source of item-item similarities -- see below. > > 1) I implement a DataModel that initially loads all data from the DB into > memory and works with the data in memory from that point on. My DataModel > implementation only accesses (read/write) the DB on refresh(). That's OK. In general, the idea was that DataModels do not cache any information. They are always the current, authoritative source of information. (FileDataModel is kind of exceptional since there is no other efficient way to operate but load and store data in memory.) So this is why the JDBC data models do not store in memory. Other components cache and store things in memory. It is fine, however, to proceed the way you propose. For performance, storing in memory is far faster, if you have enough memory. > 2) For the recommender to be fast, I need pre-computed ItemItemSimilarities. > Thus, I implement ItemSimilarity. My implementation keeps all > ItemItemSimilarities in memory, until refresh(). Like above, my > ItemSimilarity implementation only accesses (read/write) the DB on > refresh(). Yep, that is appropriate. > 3) Since I don't have a good method to calculate item similarities yet, I > want to use the following to generate itemSimilarities once: > MyItemSimilarityImpl itemSimilarity = new GenericItemSimilarity(new > PearsonCorrelationSimilarity(dataModel), dataModel, maxToKeep); That's OK. One of the main strengths of item-based recommenders is that you can meaningfully inject an external, additional notion of item similarity, to add more information that way. Here you are not adding more info than is already in the model. But, it certainly works. Later you might use a different measure. > My question is: is it good practice to keep all data in memory until > refresh? I mean, memory is of course limited, so memory-based DataModel (or > ItemSimilarity) implementations are limited, right? (For this reason I > looked to FileDataModel). Yes I would recommend you use memory as much as possible. At some point you will not be able to, of course. Then I think you would resort to a JDBC-based data model which does *not* read into memory. You might store pre-computed item-item similarities in a DB rather than memory. This will slow things down of course, but becomes necessary.
