Re: Mahout not giving recommendations with large data sets

mishkinf Tue, 18 Aug 2009 10:49:13 -0700

And just to make sure I just reduced my same dataset from 6 million to 1
million just like the functional grouplens dataset.... and it produced the
same "1" score for all recommended items and all usersids. so it must be
something specific about this data. either its too sparse or something..



mishkinf wrote:
> 
>  First off let me note I have upgraded to Mahout 0.2. Well at the moment I
> have a dataset of about 6 million right now and I have no trouble with
> that data set giving results using the slopeone grouplens example (but I
> have downloaded your changes and I will test with much larger datasets
> soon). Let me back up though. First I used the "1 million grouplens" data
> and the results work perfectly.  The slopeone recommender used in the
> default grouplens code returns a list of recommendations perfectly. Then I
> take my data which is 6 million rows (in the same exact format) and place
> it in the system and I get a bunch of recommendations BUT all of the
> scores for the recommendations are "1" and thus the same recommendations
> are made no matter the user since i assume all recommendations share the
> same score. 
> 
> Could this be because my dataset is very sparse or something? I mean in
> terms of difference between the grouplens data and mine all I can see is
> that the grouplens data starts from the number 1 for users whereas my
> userids are all about 8 or 9 digits long and I also don't have a timestamp
> but otherwise same format... which as far as I see shouldn't matter. 
> 
> format:
> userid::itemid::score
> 
> if I figure out the issue I will keep you posted. 
> 
> 
> srowen wrote:
>> 
>> PS I just fixed a bug that might cause the problem you see. It
>> resulted in an infinite loop in some cases, and I could imagine that
>> it only came up when data sets get a little larger. try the latest
>> from subversion to see if it helps.
>> 
>> On Thu, Aug 13, 2009 at 8:03 PM, mishkinf<[email protected]> wrote:
>>>
>>> I have been using mahout-0.1 release version and I am able to get
>>> recommendations with datasets roughly 5 million and under but when I
>>> attempt
>>> 10 million or so no recommendations are given to me. Has anybody had
>>> this
>>> problem? I'm not sure if I am just using the wrong recommender
>>> settings/recommender or if I should just switch to trunk version or
>>> something. Ideas? Suggestions?
>>>
>>> I have tried item-item recommender, user-item recommenders.... nearest
>>> neighborhood... tree clustering..
>>> They all produce numerous recommendations with the smaller data sets. In
>>> theory it should only get better with a larger data set.
>>>
>>> Currently I'm using item-item recommender with caching item similarities
>>> and
>>> cashing recommender..
>>>
>>> ItemSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
>>> CachingItemSimilarity cis = new CachingItemSimilarity(similarity,
>>> dataModel);
>>> recommender = new CachingRecommender(new
>>> GenericItemBasedRecommender(dataModel, similarity));
>>>
>>> ......
>>>
>>> I would like to have Mahout to work with 25-50 million rows of data but
>>> as
>>> of yet 5 million is the best i can do. RAM has also been an issue with
>>> larger data sets.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Mahout-not-giving-recommendations-with-large-data-sets-tp24956912p24956912.html
>>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Mahout-not-giving-recommendations-with-large-data-sets-tp24956912p25030090.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Mahout not giving recommendations with large data sets

Reply via email to