You were right! At least pretty close. My file data model was not loading the
last chunk of the line which had all the ratings! So everything just got a
default value of 1 I guess. Thanks for the tip. 

Now I will try test it with a much bigger data file and see what sorts of
results I get. 


srowen wrote:
> 
> On Tue, Aug 18, 2009 at 6:32 PM, mishkinf<[email protected]> wrote:
>>
>>  First off let me note I have upgraded to Mahout 0.2. Well at the moment
>> I
>> have a dataset of about 6 million right now and I have no trouble with
>> that
>> data set giving results using the slopeone grouplens example (but I have
>> downloaded your changes and I will test with much larger datasets soon).
>> Let
>> me back up though. First I used the "1 million grouplens" data and the
>> results work perfectly.  The slopeone recommender used in the default
>> grouplens code returns a list of recommendations perfectly. Then I take
>> my
>> data which is 6 million rows (in the same exact format) and place it in
>> the
>> system and I get a bunch of recommendations BUT all of the scores for the
>> recommendations are "1" and thus the same recommendations are made no
>> matter
>> the user since i assume all recommendations share the same score.
> 
> Hmm, are you using a DataModel that discards the preference values?
> what kind of model is it? in that case yes you would only ever get "1"
> as an estimated preference -- it's meaningless. Slope-one doesn't work
> at all in this case (well, the result is as you say some random result
> that is fixed for everyone).
> 
> So from what you say so far it sounds like the issue is your DataModel
> is one that does not include preference values. The source GroupLens
> data does have it.
> 
> 
>> Could this be because my dataset is very sparse or something? I mean in
>> terms of difference between the grouplens data and mine all I can see is
>> that the grouplens data starts from the number 1 for users whereas my
>> userids are all about 8 or 9 digits long and I also don't have a
>> timestamp
>> but otherwise same format... which as far as I see shouldn't matter.
> 
> Yeah this doesn't matter at all. What exactly is your data file like?
> it needs to be delimited by commas or tabs, that sort of thing, not
> double-colons. I'm wondering if that could somehow be the issue...
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Mahout-not-giving-recommendations-with-large-data-sets-tp24956912p25031035.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to