On Tue, Apr 18, 2006 at 06:46:31PM +0100, Justin Mason wrote: > > I may be a bit biased, but it takes all of 5 minutes to get setup, > Hmm. Given that about 75% of the submitters felt the need to reinvent > their own scripts, I suspect it is not quite as simple as all that ;)
That's because our documentation still sucks about how to do it, and the scripts weren't really available until recently. There was the whole svn versus rsync thing, the mkrules bit, etc. > That's definitely something that can be taken care of at the other end -- > in the worst case, we just need to not use submissions from people who > display an inability to take care of their corpora. How do we tell that someone isn't taking care of their corpus? What does "not able to take care" mean? I've had bounce messages and such in mine, others have had misfiled phishing mails, etc. Part of the goal of having more inclusion is to find people who have messages unlike our standard mails, which means that their results are likely to be different than the previously "normal" results, so some kind of automated "look for odd results" system isn't going to work either. People will have to be diligent about what messages are in their corpus, someone will have to go checking that rule results/ham hits are valid, etc. Bringing more people, especially those who may not want to invest the time to do it fully, into the mix doesn't necessarily help us out. I'd almost rather we shift this around and make a "SpamAssassin Corpora", have all of us focus on making that the best it can be, and use that for mass-checks, etc. I also don't think more people doing mass-check solves our bigger issues, such as us having very few people doing rules, and a large number of high accuracy/low hit-rate rules takes way too many resources to run. But anyway, these other things are separate so I'll stop ranting. :) > and who's to say the SoC student might not decide to carry on working on > the project? ;) see http://tirania.org/blog/archive/2006/Apr-13.html : True. It was under the "possible problem" listing. ;) > All the same, I think the general idea is a good one. I'm not saying we shouldn't do it, I'm just concerned about some of the details. Anytime we can make things easier and get more people involved is good as far as I'm concerned. -- Randomly Generated Tagline: I played poker with tarot cards: got a flush and five people died...
pgpkgF7bOxkK0.pgp
Description: PGP signature
