In an attempt to have the brainstorm type of discussion in a better
place than bugzilla...

> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078
> 
> --- Comment #28 from Darxus <[email protected]> 2011-10-27 19:20:51 UTC 
> ---
> (In reply to comment #27)
> > Unfortunately, I found that I had customers who do things like write in 
> > Greek
> > occasionally and things like that get slammed. 
> 
> Damn.  Good info to have though.
> 
> How about creating something like rules that detect these character sets after
> decoding, enabled via ok_locales?
> 
> Stuff like:
> 
> header RUSSIAN_SUBJECT Subject =~ /(АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ){2}/i

How would that be different from "write in Greek occasionally"?

This is way too restrictive anyway. Basically, you are dis-allowing any
Cyrillic word -- including a person's name, just as a quick example.

What would be needed is code to identify the non-western chars in
*relation* to western chars. And a minimum limit before triggering, to
avoid scoring a mail with a perfectly valid short English body, and a
long-ish $foreign language signature.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to