Re: Mahout not giving recommendations with large data sets

Sean Owen Tue, 18 Aug 2009 11:05:45 -0700

On Tue, Aug 18, 2009 at 6:32 PM, mishkinf<[email protected]> wrote:
>
>  First off let me note I have upgraded to Mahout 0.2. Well at the moment I
> have a dataset of about 6 million right now and I have no trouble with that
> data set giving results using the slopeone grouplens example (but I have
> downloaded your changes and I will test with much larger datasets soon). Let
> me back up though. First I used the "1 million grouplens" data and the
> results work perfectly.  The slopeone recommender used in the default
> grouplens code returns a list of recommendations perfectly. Then I take my
> data which is 6 million rows (in the same exact format) and place it in the
> system and I get a bunch of recommendations BUT all of the scores for the
> recommendations are "1" and thus the same recommendations are made no matter
> the user since i assume all recommendations share the same score.


Hmm, are you using a DataModel that discards the preference values?
what kind of model is it? in that case yes you would only ever get "1"
as an estimated preference -- it's meaningless. Slope-one doesn't work
at all in this case (well, the result is as you say some random result
that is fixed for everyone).

So from what you say so far it sounds like the issue is your DataModel
is one that does not include preference values. The source GroupLens
data does have it.


> Could this be because my dataset is very sparse or something? I mean in
> terms of difference between the grouplens data and mine all I can see is
> that the grouplens data starts from the number 1 for users whereas my
> userids are all about 8 or 9 digits long and I also don't have a timestamp
> but otherwise same format... which as far as I see shouldn't matter.

Yeah this doesn't matter at all. What exactly is your data file like?
it needs to be delimited by commas or tabs, that sort of thing, not
double-colons. I'm wondering if that could somehow be the issue...

Re: Mahout not giving recommendations with large data sets

Reply via email to