> Well, you got that kind of backwards. > The daily mass-check is to evaluate the SA rules' performance and > accuracy, and to generate frequent re-scoring based on recent spam. For > that, the rules already need to be part of the SA rule-set, so to speak, > or at least under evaluation.
Thank you for clarifying that, I had really gotten it backwards. > Again, the mass-check is done for re-scoring of the live rule-set und > publishing new rules, pushed to the users via sa-update. The current > infrastructure and workflow probably is not well-suited for experi- > mentation, but that depends on the nature of your automated rule > generation. Also, depending on the nature and amount of rules, it might > impose a considerably increased load to the contributors. Well, I could just evaluate my rules on my own corpora (when I can find a good Ham corpus) and then submit to SA whichever rules work well. As one of my ideas is generating a ton of low-support but high-confidence rules, that would probably increase considerably the load to the contributors, as you said. Once again, thanks for making things clearer, Marco Túlio Ribeiro 2010/10/15 Karsten Bräckelmann <[email protected]> > > Would it be possible to drop the HTML and use text/plain mail? :) > > On Fri, 2010-10-15 at 18:21 -0300, Marco Ribeiro wrote: > > I'm sorry I wasn't clear. I am looking for downloadable ham corpora in > > order to try to develop a way to find new rules in an automatic or > > semi-automatic way. > > > After I generate new rules, I would need to test their accuracy > > somehow, the mass check seems to be a good way. So I guess my question > > about the mass check is wether or not my rules will be tested on > > others' corpora as well as on my own corpus. > > Well, you got that kind of backwards. > > The daily mass-check is to evaluate the SA rules' performance and > accuracy, and to generate frequent re-scoring based on recent spam. For > that, the rules already need to be part of the SA rule-set, so to speak, > or at least under evaluation. > > The rules used for the mass-check run are in SVN. To commit rules there, > one needs to be a committer to the SA project first. > > Again, the mass-check is done for re-scoring of the live rule-set und > publishing new rules, pushed to the users via sa-update. The current > infrastructure and workflow probably is not well-suited for experi- > mentation, but that depends on the nature of your automated rule > generation. Also, depending on the nature and amount of rules, it might > impose a considerably increased load to the contributors. > > Anyway, without knowing some clear details first, we cannot even know if > it might be possible. > > > > I read that, but I wasn't sure wether or not it was a warning against > > using others' corpora for means other than evaluating rules. Thanks > > for the clarification and for the quick reply. > > Well, not absolutely sure, but I believe most mass-check contributors > are running it locally on their machines, and just upload the logs. > > > -- > char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; > main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: > (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}} >
