I see. I thought sampling rate was only about providing a way to skip some input records (user, item, preference tuples) to lower memory requirements and increase speed. I didn't realize it could affect recommendation computation...
3) is definitely needed, at least in my case, and that's what I do. Big time. :) 2) is also good to know - if different sets of recommended items all look good (i.e. really do feel like good recommendations) to users, this adds variety, and I feel that can be a good thing, at least in my current domain. So I suppose I simply can't have the sampling rate too low. Thanks Owen. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Sean Owen <[email protected]> > To: [email protected] > Sent: Wednesday, June 3, 2009 3:45:18 PM > Subject: Re: Inconsistent recommendations > > If you are using any of the 'samplingRate' parameters, then down in > the code it is using a random number generator to select some subset > of things to look at. That means you could get different results, due > to different neighborhoods, etc. on each request. > > Is it bad behavior? Well: > > 1) If sampling rates aren't too low, the results shouldn't be very > different, even if they are not identical. So one conclusion could be > sampling is having too large an effect and the rate needs to go up > > 2) The assumption is that any of the slightly different results you > may get are about equally 'good' anyway > > 3) I suppose I think of computing recommendation as a > relatively-speaking infrequent event. You might compute them once a > day or hour. Or you compute on the fly and cache it, either externally > or in the framework. So, it shouldn't be the case that the same > recommendations are computed over and over in a row, where the > differences might become noticeable, in an application, to a user > > > Is it possible to guarantee the same recommendation, even when using > sampling, if the data doesn't change? wouldn't be too hard to always > use a local RNG and always seed it the same way, no. It would be a > performance hit. > > My first reaction though is #3 -- cache. Is that a feasible response? > > > Sean > > > > On Wed, Jun 3, 2009 at 8:29 PM, Otis Gospodnetic > wrote: > > Hello, > > > > I haven't debugged this yet, but I was playing with sampling rate in Taste > > and > noticed a weird behaviour where the recommender doesn't give consistent > results > -- when it gives them they are always the same, but sometimes it doesn't give > them. For example: > > > > > > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' > > a1 > > a2 > > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' > > a1 > > a2 > > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' -- no > recommendations from this call! > > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' > > a1 > > a2 > > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' -- no > recommendations from this call! > > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' > > a1 > > a2 > > > > Another way to see this is if I use different sampling rates and collect > output, like this: > > $ for x in `seq 1 1000`; do curl --silent > 'http://localhost:8080/re/recommend?userID=u4&howMany=10'; done > (output > file > here) > > > > I get this: > > > > -rw-r--r-- 1 otis otis 5994 2009-06-03 15:24 out-1-sr0.8 > > -rw-r--r-- 1 otis otis 5988 2009-06-03 15:24 out-2-sr0.8 -- different > outputs! > > > > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:23 out-1-sr0.9 > > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:23 out-2-sr0.9 > > > > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:22 out-1-sr0.99 > > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:22 out-2-sr0.99 > > > > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:20 out-1-sr1.0 > > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:21 out-2-sr1.0 > > > > If this worked consistently, the outputs should be identical, no? > > > > This doesn't look normal...bug? > > I'm attaching my sample input (but ML software may strip it). > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >
