Theo Van Dinter wrote:
On Mon, Nov 21, 2005 at 08:38:05PM -0800, Justin Mason wrote:
well, it's more than that. with a small number of corpora, the
scores will be over-optimised for those people. It's a tricky
problem....
I've actually been thinking about this a bit. Our normal mass-check runs
are heavily weighted towards a small number of people already. For 3.1,
we used 9 people's logs. It totalled 1766844 messages (bmenschel's
wasn't included apparently). Breaking it down:
Percent Provider
------- ----------
33.93 jm
31.00 theo
9.35 daf
7.68 rod
6.05 parkerm
5.62 bzoetekouw
5.11 quinlan
1.20 cthielen
0.07 misak
So basically Justin is 34%, I'm 31%, and everyone else combined is 35%.
So in reality, the scores are far more tuned for Justin and myself than
any other single person.
This is something I've been trying to think about wrt doing weekly score
generations for use by sa-update, but no real solution has come to mind yet.
We seriously need to improve documentation and tools to make it easier
for people to understand and do this. At our company we need to almost
cripple our Asian office spamassassin because of the FP levels. We need
better representation especially from non-Western users in mass checks.
I for example am trying to get a few native Japanese employees at my
office to participate because of the total lack of Asian representation
currently in mass check. They misunderstood the sorting directions at
first, so I need to train them myself to make sure they do a good job at it.
Warren Togami
[EMAIL PROTECTED]