On Sat, 2011-03-12 at 20:02 -0500, [email protected] wrote:
> > Moreover, I'd guess the corpus is NOT sufficiently large, by far -- even
> 
> It would certainly be nice to have more data.  

My reference to the recent thread was an invitation to read it.

Well, to take this into perspective for you -- IIRC we're struggling
with a self-enforced limit of something in the ball-park of 500k spam
for the mass-check, aren't we? Within 2 months.

Now, can you say "double that", 1 million?

A DAY. Which makes it roughly about 100 times what we have. In sheer
mass. No processing [1] involved yet.


[1] Let alone daily processing, rather than some smart feeding to a DB,
    pruning old data, and stuff.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to