First off let me note I have upgraded to Mahout 0.2. Well at the moment I have a dataset of about 6 million right now and I have no trouble with that data set giving results using the slopeone grouplens example (but I have downloaded your changes and I will test with much larger datasets soon). Let me back up though. First I used the "1 million grouplens" data and the results work perfectly. The slopeone recommender used in the default grouplens code returns a list of recommendations perfectly. Then I take my data which is 6 million rows (in the same exact format) and place it in the system and I get a bunch of recommendations BUT all of the scores for the recommendations are "1" and thus the same recommendations are made no matter the user since i assume all recommendations share the same score.
Could this be because my dataset is very sparse or something? I mean in terms of difference between the grouplens data and mine all I can see is that the grouplens data starts from the number 1 for users whereas my userids are all about 8 or 9 digits long and I also don't have a timestamp but otherwise same format... which as far as I see shouldn't matter. format: userid::itemid::score if I figure out the issue I will keep you posted. srowen wrote: > > PS I just fixed a bug that might cause the problem you see. It > resulted in an infinite loop in some cases, and I could imagine that > it only came up when data sets get a little larger. try the latest > from subversion to see if it helps. > > On Thu, Aug 13, 2009 at 8:03 PM, mishkinf<[email protected]> wrote: >> >> I have been using mahout-0.1 release version and I am able to get >> recommendations with datasets roughly 5 million and under but when I >> attempt >> 10 million or so no recommendations are given to me. Has anybody had this >> problem? I'm not sure if I am just using the wrong recommender >> settings/recommender or if I should just switch to trunk version or >> something. Ideas? Suggestions? >> >> I have tried item-item recommender, user-item recommenders.... nearest >> neighborhood... tree clustering.. >> They all produce numerous recommendations with the smaller data sets. In >> theory it should only get better with a larger data set. >> >> Currently I'm using item-item recommender with caching item similarities >> and >> cashing recommender.. >> >> ItemSimilarity similarity = new PearsonCorrelationSimilarity(dataModel); >> CachingItemSimilarity cis = new CachingItemSimilarity(similarity, >> dataModel); >> recommender = new CachingRecommender(new >> GenericItemBasedRecommender(dataModel, similarity)); >> >> ...... >> >> I would like to have Mahout to work with 25-50 million rows of data but >> as >> of yet 5 million is the best i can do. RAM has also been an issue with >> larger data sets. >> -- >> View this message in context: >> http://www.nabble.com/Mahout-not-giving-recommendations-with-large-data-sets-tp24956912p24956912.html >> Sent from the Mahout User List mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Mahout-not-giving-recommendations-with-large-data-sets-tp24956912p25029772.html Sent from the Mahout User List mailing list archive at Nabble.com.
