Re: 3.0.5 rescoring

Doc Schneider Mon, 21 Nov 2005 22:01:54 -0800

Theo Van Dinter wrote:

On Mon, Nov 21, 2005 at 08:38:05PM -0800, Justin Mason wrote:

well, it's more than that.  with a small number of corpora, the
scores will be over-optimised for those people.   It's a tricky
problem....



I've actually been thinking about this a bit.  Our normal mass-check runs
are heavily weighted towards a small number of people already.  For 3.1,
we used 9 people's logs.  It totalled 1766844 messages (bmenschel's
wasn't included apparently).  Breaking it down:

Percent Provider
------- ----------
33.93   jm
31.00   theo
9.35    daf
7.68    rod
6.05    parkerm
5.62    bzoetekouw
5.11    quinlan
1.20    cthielen
0.07    misak

So basically Justin is 34%, I'm 31%, and everyone else combined is 35%.
So in reality, the scores are far more tuned for Justin and myself than
any other single person.

This is something I've been trying to think about wrt doing weekly score
generations for use by sa-update, but no real solution has come to mind yet.

Odd I'm not in there. I should be as doc (for my rsync corpus) maybe Ineed to send all my corpus again.


-Doc

Re: 3.0.5 rescoring

Reply via email to