Yes, an item-based recommender is better -- slope-one is also reasonable. Slope one takes more precomputation to start up but is likely faster at runtime. Also, try LogLikelihoodSimilarity. If you find it actually gives better results, that's good news. It doesn't even use rating values, so, you would then be able to drop that part of your data.
Yes the GroupLens example is still there and updated for the new changes. I am still quite puzzled about not getting recommendations. Are you doing any sampling in any part of the code? that is if using less and less of the data as it grows, in order to scale, maybe at some point you are omitting so much data that it's sparse enough that many similarities can't be computed. On Thu, Aug 13, 2009 at 11:56 PM, mishkinf<[email protected]> wrote: > > Well in fact it is strange because I have the same data set and when it is 5 > million lines it produces a number of recommendation results then when it is > more it simply returns no results but does not run into memory exceptions or > anything abnormal that is bring printed on the console. This confused me. > > My dad is basically of the form -1 to 1. I am looking at a list of purchased > items aka.. > <userid> <itemid> <# times purchased> > but then I run a normalization algorithm on it so the data returned is > actually > <userid> <itemid> <value -1 to 1> > > In terms of users vs products. I'm looking at much much more users than > products (millions/thousands). And users are always growing too. This is why > I was thinking item based recommenders were good. > > This normalized data is the data I feed to mahout. I basically modified the > GroupLens example and that has been what I was working off of. If that > example exists in the 0.2 version it might be worth my while to upgrade. >
