Re: Mahout not giving recommendations with large data sets

mishkinf Thu, 13 Aug 2009 15:56:54 -0700

Well in fact it is strange because I have the same data set and when it is 5
million lines it produces a number of recommendation results then when it is
more it simply returns no results but does not run into memory exceptions or
anything abnormal that is bring printed on the console. This confused me.


My dad is basically of the form -1 to 1. I am looking at a list of purchased
items aka..
<userid> <itemid> <# times purchased> 
but then I run a normalization algorithm on it so the data returned is
actually
<userid> <itemid> <value -1 to 1>

In terms of users vs products. I'm looking at much much more users than
products (millions/thousands). And users are always growing too. This is why
I was thinking item based recommenders were good. 

This normalized data is the data I feed to mahout. I basically modified the
GroupLens example and that has been what I was working off of. If that
example exists in the 0.2 version it might be worth my while to upgrade. 



srowen wrote:
> 
> One thing I can tell you is that mahout-0.2 will be significantly
> faster and use less memory. On one particular setup I am working on
> with a client, we needed 1GB heap to hold 5M ratings in memory, and
> needed about 1 second to generate recommendations. After recent
> changes, it fits in 360M and takes about 0.3s to generate a
> recommendation.
> 
> The catch is that most APIs changed significantly. You'll have to do
> some work to adapt to the new code. It is available now from
> Subversion, and I would welcome anyone who is willing to try it out,
> as it is a big change and still quite new.
> 
> 
> Do you mean it fails when the data set gets larger, or you get a
> return value, it just has no recommendations? That latter result would
> be very puzzling. Is that what you see?
> 
> 
> Beyond this, I think your system could be 'tuned' more by selecting
> perhaps faster, more specific implementations to use. For example,
> tell me about the nature of your 'ratings' in your system. In many
> cases, it's actually better (and much faster) to completely ignore
> ratings. There is support for that in the framework.
> 
> 
> On Thu, Aug 13, 2009 at 8:03 PM, mishkinf<[email protected]> wrote:
>>
>> I have been using mahout-0.1 release version and I am able to get
>> recommendations with datasets roughly 5 million and under but when I
>> attempt
>> 10 million or so no recommendations are given to me. Has anybody had this
>> problem? I'm not sure if I am just using the wrong recommender
>> settings/recommender or if I should just switch to trunk version or
>> something. Ideas? Suggestions?
>>
>> I have tried item-item recommender, user-item recommenders.... nearest
>> neighborhood... tree clustering..
>> They all produce numerous recommendations with the smaller data sets. In
>> theory it should only get better with a larger data set.
>>
>> Currently I'm using item-item recommender with caching item similarities
>> and
>> cashing recommender..
>>
>> ItemSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
>> CachingItemSimilarity cis = new CachingItemSimilarity(similarity,
>> dataModel);
>> recommender = new CachingRecommender(new
>> GenericItemBasedRecommender(dataModel, similarity));
>>
>> ......
>>
>> I would like to have Mahout to work with 25-50 million rows of data but
>> as
>> of yet 5 million is the best i can do. RAM has also been an issue with
>> larger data sets.
>> --
>> View this message in context:
>> http://www.nabble.com/Mahout-not-giving-recommendations-with-large-data-sets-tp24956912p24956912.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Mahout-not-giving-recommendations-with-large-data-sets-tp24956912p24961458.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Mahout not giving recommendations with large data sets

Reply via email to