Hello Taste-Community,
since a few weeks I tested with mahout-taste (Release Apache Mahout 0.1)
- and I like it :-)!
I have created a working Item-Based-Recommender and now I have some
questions about speed and memory
... maybe you can give me a hint what I have to improve.
1. ItemCorrelation: I precompute all correlations for approximately
100000 items and save them in a MySqlDataBase if they correlate
more than e.g. 0.95 . Then I get the correlation in the
recommender in that way:
//use the _precomputed_ ItemItemCorrelation
String[] splittArray = *null*;
String strLine = *null*;
ItemItemSimilarity aItemItemCorrelation = *null*;
Collection<GenericItemSimilarity.ItemItemSimilarity>
correlationMatrix =
*new* ArrayList<GenericItemSimilarity.ItemItemSimilarity>();
// open File:
BufferedReader inStream = *new* BufferedReader(*new*
FileReader(filePath));
*while*((strLine = inStream.readLine()) != *null*)
{
splittArray = strLine.split(",");
Item aItem1 = *new* GenericItem<String>(splittArray[0]);
Item aItem2 = *new* GenericItem<String>(splittArray[1]);
aItemItemCorrelation = *new
*GenericItemSimilarity.ItemItemSimilarity(aItem1, aItem2,
Double./parseDouble/(splittArray[2]) );
correlationMatrix.add(aItemItemCorrelation);
}
…
// set the ItemSimilarity:
* **this*.itemSimilarity = *new*
GenericItemSimilarity(correlationMatrix);
…
// set Recommender:
recommender = *new*
GenericItemBasedRecommender(*super*.getModel(), itemSimilarity);
…
// set CachingRecommender:
* this*.cachingRecommender = *new*
CachingRecommender(recommender);
Question 1:
The similarity-matrix uses 400MB memory at the MySQLDB - by
setting the ItemCorrelation 8GB Ram will be used to load the
similarity-matrix as a GenericItemSimilarity. Is it
possible/plausible that this matix uses more than 20 times more
memory in RAM then in the Database - or have I do something wrong ?
Question 2:
How can I reduce the memory consumption from the
GenericItemSimilarity? - |*GenericItemSimilarity
<http://lucene.apache.org/mahout/javadoc/core/org/apache/mahout/cf/taste/impl/similarity/GenericItemSimilarity.html#GenericItemSimilarity%28java.lang.Iterable,%20int%29>*(Iterable
<http://java.sun.com/javase/6/docs/api/java/lang/Iterable.html?is-external=true><GenericItemSimilarity.ItemItemSimilarity
<http://lucene.apache.org/mahout/javadoc/core/org/apache/mahout/cf/taste/impl/similarity/GenericItemSimilarity.ItemItemSimilarity.html>>
similarities,
int maxToKeep)|
does't work, because if maxToKeep is too small, the
recommendations will be bad ...
2. Speed of Recommendation: I use a MySQLJDBCDataModel - MyISAM.
Primary Key and Indexes are set:
PRIMARY KEY (user_id, item_id),INDEX (user_id),INDEX (item_id). A
Recommendation for a User takes between 0,5 and 80 seconds - I
would like if this takes just 300ms.
By the way I use a Quadcore 3,2 GHz with 32G-RAM to compute the
recommendations, so maybe the DB is the Bottleneck. But if I use a
FileDataModel it is faster, but not really much.
Heres a log for a User with 2000 belonging Items:
INFO CollaborativeModel - Seconds to set ItemCorrelation: 76.187506 s
INFO CollaborativeModel - Seconds to set Recommender:
0.025945000000000003 s
INFO CollaborativeModel - Seconds to set CachingRecommender: 0.06511 s
INFO CollaborativeController - SECONDS TO REFRESH THE SYSTEM:
6.450000000000001E-4 s
INFO root - SECONDS TO GET A RECOMMENDATION FOR USER: 50.888347 s
Question:
Is there a way to increase the speed of a recommendation? (use
InnoDB?, compute less Items ... someway ;-)...?)
So if you have some idea how I could reduce the memory consumption and
increase the recommendation speed I would be very thankfully.
best regards
Thomas
--
___________________________________________________________
Thomas Rewig
___________________________________________________________