Yeah that's what it's about -- skipping some randomly-selected stuff, which changes the input to the calculations and potentially changes its output. It's about speed. It doesn't 'permanently' omit data -- if I threw out half the users, sure things would be deterministic then but half the users would be pretty unhappy... similar arguments mean we have to keep all the data and ignore parts of it selectively in different contexts.
Unless the variation is problematic, I wouldn't raise the sampling rate. It's only if it gets too weird you might consider it. If you get no recommendations but have reason to believe you should, you could always query again too. PS first name is Sean... On Wed, Jun 3, 2009 at 9:00 PM, Otis Gospodnetic <[email protected]> wrote: > > I see. I thought sampling rate was only about providing a way to skip some > input records (user, item, preference tuples) to lower memory requirements > and increase speed. I didn't realize it could affect recommendation > computation... > > 3) is definitely needed, at least in my case, and that's what I do. Big > time. :) > 2) is also good to know - if different sets of recommended items all look > good (i.e. really do feel like good recommendations) to users, this adds > variety, and I feel that can be a good thing, at least in my current domain. > > So I suppose I simply can't have the sampling rate too low. Thanks Owen.
