Re: 3.0.5 rescoring

Theo Van Dinter Mon, 21 Nov 2005 21:44:43 -0800

On Mon, Nov 21, 2005 at 08:38:05PM -0800, Justin Mason wrote:
> well, it's more than that.  with a small number of corpora, the
> scores will be over-optimised for those people.   It's a tricky
> problem....


I've actually been thinking about this a bit.  Our normal mass-check runs
are heavily weighted towards a small number of people already.  For 3.1,
we used 9 people's logs.  It totalled 1766844 messages (bmenschel's
wasn't included apparently).  Breaking it down:

Percent Provider
------- ----------
33.93   jm
31.00   theo
9.35    daf
7.68    rod
6.05    parkerm
5.62    bzoetekouw
5.11    quinlan
1.20    cthielen
0.07    misak

So basically Justin is 34%, I'm 31%, and everyone else combined is 35%.
So in reality, the scores are far more tuned for Justin and myself than
any other single person.

This is something I've been trying to think about wrt doing weekly score
generations for use by sa-update, but no real solution has come to mind yet.

-- 
Randomly Generated Tagline:
"We don't make mistakes ... We just have happy little accidents."
                      - The Joy of Bikini Waxing (HardCore TV)

pgpvIH2narM7m.pgp
Description: PGP signature

Re: 3.0.5 rescoring

Reply via email to