In an attempt to have the brainstorm type of discussion in a better place than bugzilla...
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078 > > --- Comment #28 from Darxus <[email protected]> 2011-10-27 19:20:51 UTC > --- > (In reply to comment #27) > > Unfortunately, I found that I had customers who do things like write in > > Greek > > occasionally and things like that get slammed. > > Damn. Good info to have though. > > How about creating something like rules that detect these character sets after > decoding, enabled via ok_locales? > > Stuff like: > > header RUSSIAN_SUBJECT Subject =~ /(АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ){2}/i How would that be different from "write in Greek occasionally"? This is way too restrictive anyway. Basically, you are dis-allowing any Cyrillic word -- including a person's name, just as a quick example. What would be needed is code to identify the non-western chars in *relation* to western chars. And a minimum limit before triggering, to avoid scoring a mail with a perfectly valid short English body, and a long-ish $foreign language signature. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
