Re: R: The best evaluator for recommendations in binary data sets

Ted Dunning Tue, 28 Jul 2009 11:28:43 -0700

Off-line evaluation of recommendations is a really difficult problem.  The
problem is one of reject-inference, that is, your test data was sampled
using one recommendation engine which biases all of your data in favor of
engines like that one.  A very different engine might produce very different
recommendations that would be much better.  This is especially true in cases
of binary input (the only important case for most applications).

The better evaluation method is to run multiple parametrized versions of
recommendations and do efficient parameter search to find what the better
engines are.  This has to include the UI in the parameterizations because it
can have such a huge effect.  Unfortunately, this isn't feasible for wild
new approaches where you have thousands or millions of potential engine
configurations.  It is still the bread and butter of evaluation for real
systems.  I have found that simple inspection suffices to do rough cut
evaluation and automated multi-variate testing is the way to judge fine
grained distinctions.

As a measure of how big an effect the UI can have, I recently had a system
that gave 10 results per page.  Here are some (unscaled) click rates
depending on search rank:

Rank  Click Rate
  0  853
  1  415
  2  238
  3  184
  4  170
  5  167
  6  133
  7  125
  8  121
  9  150
 10    0
 11    2
 12    2
 13    2
 14    2
 15    4
 16    2
 17    0
 18    0
 19    0
 20    3

The extraordinary thing about these results is that apart from the first
three or so, the click rate is essentially constant down to the 10th
result.  Then it is clear that *nobody* is clicking to the next page.  Based
on this, I would expect as much as a 50% increase in total clicks just by
presenting 20 results.  I have almost NEVER seen algorithmic changes that
would make such a large difference.

On Tue, Jul 28, 2009 at 7:10 AM, Claudia Grieco <[email protected]>wrote:

> Thanks a lot :) I was wondering what those IR classes were for XD
>
> -----Messaggio originale-----
> Da: Sean Owen [mailto:[email protected]]
> Inviato: martedì 28 luglio 2009 15.52
> A: [email protected]
> Oggetto: Re: The best evaluator for recommendations in binary data sets
>
> No, really those types of evaluation do not apply to your case. Those
> evaluate how closely the estimated preference values match real ones.
> But in your case, you have no preference values (or they're implicitly
> all '1.0' or something) so this is meaningless.
>
> What you are likely interested in is something related but different
> -- statistics like precision and recall. That is you are concerned
> with whether the recommender will recommend a lot of items the user
> would be associated to. For example maybe you take away three of the
> user's items and see if the recommender recommends 3 of them back.
>
> Look at GenericRecommenderIRStatsEvaluator instead. It can compute
> precision and recall figures, which is more what you want.
>
> On Tue, Jul 28, 2009 at 2:35 PM, Claudia Grieco<[email protected]>
> wrote:
> > Hi guys,
> >
> > I have created an user based recommender which operates on a binary  data
> > set (an user has bought or not bought a product)
> >
> > I'm using BooleanTanimoto Coefficient, BooleanUserGenericUserBased and so
> > on.
> >
> > Is using AverageAbsoluteDifferenceRecommenderEvaluator to evaluate the
> > recommender a good idea?
> >
> >
>
>

-- 
Ted Dunning, CTO
DeepDyve

Re: R: The best evaluator for recommendations in binary data sets

Reply via email to