On Tue, 2009-10-20 at 11:43 +0100, Justin Mason wrote:
> 2009/10/19 Karsten Bräckelmann <[email protected]>:
> > Checking the latest ruleqa results, there are some ham hits I believe
> > should not exist. Calling the respective corpus owners to the phone. :)
> >
> > bb-jm -- Justin, is this really ham?
> >  bbmass/uploadedcorpora/jm/ham/pub.20070118/96
> 
> a quoted spam sent to the users list.  deleted.

Quoted?  How exactly -- the rule that identified it is a header rule.

> > bb-trec_enron -- No idea, who/what that is, and the logs are broken.
> > However, I still believe the hits on KB_RATWARE_OUTLOOK_08 and even
> > worse KB_RATWARE_MSGID are false. Who can validate those?
> 
> they're false.  it's a corpus with generated (synthetic) headers from
> the TREC Enron corpus, only useful for body hits.

Yay, back to zero FPs for my Ratware Message-Id and Boundary (Outlook
variant, sadly) then. That's how it should be. ;)

Does this also have an impact on the GA re-scoring?


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to