You were right! At least pretty close. My file data model was not loading the last chunk of the line which had all the ratings! So everything just got a default value of 1 I guess. Thanks for the tip.
Now I will try test it with a much bigger data file and see what sorts of results I get. srowen wrote: > > On Tue, Aug 18, 2009 at 6:32 PM, mishkinf<[email protected]> wrote: >> >> First off let me note I have upgraded to Mahout 0.2. Well at the moment >> I >> have a dataset of about 6 million right now and I have no trouble with >> that >> data set giving results using the slopeone grouplens example (but I have >> downloaded your changes and I will test with much larger datasets soon). >> Let >> me back up though. First I used the "1 million grouplens" data and the >> results work perfectly. The slopeone recommender used in the default >> grouplens code returns a list of recommendations perfectly. Then I take >> my >> data which is 6 million rows (in the same exact format) and place it in >> the >> system and I get a bunch of recommendations BUT all of the scores for the >> recommendations are "1" and thus the same recommendations are made no >> matter >> the user since i assume all recommendations share the same score. > > Hmm, are you using a DataModel that discards the preference values? > what kind of model is it? in that case yes you would only ever get "1" > as an estimated preference -- it's meaningless. Slope-one doesn't work > at all in this case (well, the result is as you say some random result > that is fixed for everyone). > > So from what you say so far it sounds like the issue is your DataModel > is one that does not include preference values. The source GroupLens > data does have it. > > >> Could this be because my dataset is very sparse or something? I mean in >> terms of difference between the grouplens data and mine all I can see is >> that the grouplens data starts from the number 1 for users whereas my >> userids are all about 8 or 9 digits long and I also don't have a >> timestamp >> but otherwise same format... which as far as I see shouldn't matter. > > Yeah this doesn't matter at all. What exactly is your data file like? > it needs to be delimited by commas or tabs, that sort of thing, not > double-colons. I'm wondering if that could somehow be the issue... > > -- View this message in context: http://www.nabble.com/Mahout-not-giving-recommendations-with-large-data-sets-tp24956912p25031035.html Sent from the Mahout User List mailing list archive at Nabble.com.
