First off let me note I have upgraded to Mahout 0.2. Well at the moment I
have a dataset of about 6 million right now and I have no trouble with that
data set giving results using the slopeone grouplens example (but I have
downloaded your changes and I will test with much larger datasets soon). Let
me back up though. First I used the "1 million grouplens" data and the
results work perfectly.  The slopeone recommender used in the default
grouplens code returns a list of recommendations perfectly. Then I take my
data which is 6 million rows (in the same exact format) and place it in the
system and I get a bunch of recommendations BUT all of the scores for the
recommendations are "1" and thus the same recommendations are made no matter
the user since i assume all recommendations share the same score. 

Could this be because my dataset is very sparse or something? I mean in
terms of difference between the grouplens data and mine all I can see is
that the grouplens data starts from the number 1 for users whereas my
userids are all about 8 or 9 digits long and I also don't have a timestamp
but otherwise same format... which as far as I see shouldn't matter. 

format:
userid::itemid::score

if I figure out the issue I will keep you posted. 


srowen wrote:
> 
> PS I just fixed a bug that might cause the problem you see. It
> resulted in an infinite loop in some cases, and I could imagine that
> it only came up when data sets get a little larger. try the latest
> from subversion to see if it helps.
> 
> On Thu, Aug 13, 2009 at 8:03 PM, mishkinf<[email protected]> wrote:
>>
>> I have been using mahout-0.1 release version and I am able to get
>> recommendations with datasets roughly 5 million and under but when I
>> attempt
>> 10 million or so no recommendations are given to me. Has anybody had this
>> problem? I'm not sure if I am just using the wrong recommender
>> settings/recommender or if I should just switch to trunk version or
>> something. Ideas? Suggestions?
>>
>> I have tried item-item recommender, user-item recommenders.... nearest
>> neighborhood... tree clustering..
>> They all produce numerous recommendations with the smaller data sets. In
>> theory it should only get better with a larger data set.
>>
>> Currently I'm using item-item recommender with caching item similarities
>> and
>> cashing recommender..
>>
>> ItemSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
>> CachingItemSimilarity cis = new CachingItemSimilarity(similarity,
>> dataModel);
>> recommender = new CachingRecommender(new
>> GenericItemBasedRecommender(dataModel, similarity));
>>
>> ......
>>
>> I would like to have Mahout to work with 25-50 million rows of data but
>> as
>> of yet 5 million is the best i can do. RAM has also been an issue with
>> larger data sets.
>> --
>> View this message in context:
>> http://www.nabble.com/Mahout-not-giving-recommendations-with-large-data-sets-tp24956912p24956912.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Mahout-not-giving-recommendations-with-large-data-sets-tp24956912p25029772.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to