If you are using any of the 'samplingRate' parameters, then down in the code it is using a random number generator to select some subset of things to look at. That means you could get different results, due to different neighborhoods, etc. on each request.
Is it bad behavior? Well: 1) If sampling rates aren't too low, the results shouldn't be very different, even if they are not identical. So one conclusion could be sampling is having too large an effect and the rate needs to go up 2) The assumption is that any of the slightly different results you may get are about equally 'good' anyway 3) I suppose I think of computing recommendation as a relatively-speaking infrequent event. You might compute them once a day or hour. Or you compute on the fly and cache it, either externally or in the framework. So, it shouldn't be the case that the same recommendations are computed over and over in a row, where the differences might become noticeable, in an application, to a user Is it possible to guarantee the same recommendation, even when using sampling, if the data doesn't change? wouldn't be too hard to always use a local RNG and always seed it the same way, no. It would be a performance hit. My first reaction though is #3 -- cache. Is that a feasible response? Sean On Wed, Jun 3, 2009 at 8:29 PM, Otis Gospodnetic <[email protected]> wrote: > Hello, > > I haven't debugged this yet, but I was playing with sampling rate in Taste > and noticed a weird behaviour where the recommender doesn't give consistent > results -- when it gives them they are always the same, but sometimes it > doesn't give them. For example: > > > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' > a1 > a2 > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' > a1 > a2 > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' -- no > recommendations from this call! > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' > a1 > a2 > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' -- no > recommendations from this call! > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10' > a1 > a2 > > Another way to see this is if I use different sampling rates and collect > output, like this: > $ for x in `seq 1 1000`; do curl --silent > 'http://localhost:8080/re/recommend?userID=u4&howMany=10'; done > (output > file here) > > I get this: > > -rw-r--r-- 1 otis otis 5994 2009-06-03 15:24 out-1-sr0.8 > -rw-r--r-- 1 otis otis 5988 2009-06-03 15:24 out-2-sr0.8 -- different > outputs! > > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:23 out-1-sr0.9 > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:23 out-2-sr0.9 > > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:22 out-1-sr0.99 > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:22 out-2-sr0.99 > > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:20 out-1-sr1.0 > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:21 out-2-sr1.0 > > If this worked consistently, the outputs should be identical, no? > > This doesn't look normal...bug? > I'm attaching my sample input (but ML software may strip it). > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >
