Thank you for your fast reply!
Sean Owen:
On Fri, Jul 10, 2009 at 10:03 AM, Thomas Rewig<[email protected]> wrote:
Question 1:
The similarity-matrix uses 400MB memory at the MySQLDB - by
setting the ItemCorrelation 8GB Ram will be used to load the
similarity-matrix as a GenericItemSimilarity. Is it
possible/plausible that this matix uses more than 20 times more
memory in RAM then in the Database - or have I do something wrong ?
I could believe this. 100,000 items means about 5,000,000,000
item-item pairs are possible. Many are not kept, but seeing as each
once requires 30 or so bytes of memory, I am not surprised that it
could take 8GB.
That's really a lot to keep in memory. I might suggest, instead, that
you not pre-compute the similarities, but instead compute them as
needed and cache (use CachingItemSimilarity). That way you are not
spending so much memory on pairs that may never get used, but still
get much of the speed improvement.
In the Moment to get the similarity-matrix I do that:
* create a DataModel (MySQLDB) in this Way: aItem,
aItemCharacteristic, aItemValue (each aItem have 40
aItemCharacteristics later there will be more)
* set a UserSimilarity - Pearson or Euclidian
* get in a multithreaded way all similarities: aCorrelation =
aUserSimilarity.userSimilarity(user1, user2); - this is stressful
for cpu, but in 4 hours it is done - not bad for n!/(n-2)!
combinations ;-)
* save them if they correlate more than 0.95
* get it in the GenericItemSimilarity to use it in a
ItemBasedRecommender
Ok I will test with the Casching(?)Similarity. If I understand you
right, this will mean I
* create a DataModel_1 (MySQLDB) in this Way: aItem,
aItemCharacteristic, aItemValue (each aItem have 40
aItemCharacteristics)
* create a UserSimilarity so that I have the similarity of the
aItems (if I use ItemSimilarity I would get the similarity of the
aItemCharacteristic ... right?)
* create a CachingUserSimilarity and put DataModel_1 and the
UserSimilarity in there
* create a DataModel_2 (MySQLDB) in this Way:
aUser,aItem,aItemPreference
* create the Neighborhood
* create a UserBasedRecommender and put the Neighborhood, the
DataModel_2 and the CachingUserSimilarity in there
* create a CachingRecommender
* et voilĂ :-) I have a working memory sparing recommender
But I can't do that with a Itembased-Recommender because I have no
ItemCorrelation (because theSimilarity of aItemCharacteristic doesn't
matter ), is that right? So the sentence in the docu: "So, item-based
recommenders can use pre-computed similarity values in the computations,
which make them much faster. For large data sets, item-based
recommenders are more appropriate" doesn't work for me. Or
In the moment I have a Testset of 500000 Users and 100000 Items. The
Item-Similarity is computed with taste, but with external data.
Sean Owen:
Question:
Is there a way to increase the speed of a recommendation? (use
InnoDB?, compute less Items ... someway ;-)...?)
Your indexes are right. Are you using a connection pool? that is
really important.
Yes I do use a connection pool:
this.cPoolDS = new ConnectionPoolDataSource(dataSource);
this.aConnection = cPoolDS.getConnection();
Sean Owen:
How many users do you have? if you have relatively few users, you
might use a user-based recommender instead. Or, consider a slope-one
recommender.
In the Moment there are 5 times more users than items - later this could
change to 1.5 Mio Items and 150,000 users but first my tests must work.
I testet the slope-one recommender as taste wasn't in mahout and I
found, that the recommendations don't work for me. Has there something
changed? ... maybe I should give it another try.
Sean Owen:
It sounds like you have a lot of items, so the way item-based
recommenders work, it will be slow.
Using CachingItemSimilarity could help. I am surprised that a
FileDataModel isn't much faster, since it loads data in memory. That
suggests to me that the database isn't the bottleneck.
Are you using multiple threads to compute recommendations
simultaneously? you certainly can, to take advantage of the 4 cores.
Yes I do, but every .recommend command is taste intern only a single
thread. Is that right?
best regards
Thomas
--
___________________________________________________________
Thomas Rewig
___________________________________________________________