On 10/27, Karsten Bräckelmann wrote:
> > header RUSSIAN_SUBJECT Subject =~ /(АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ){2}/i
>
> How would that be different from "write in Greek occasionally"?
I poorly guessed what exactly that looked like. Although, comparing the
Russian and Greek alphabets on Wikipedia now, they have entirely separate
ranges of characters. Russian is U+04xx and Greek is U+03xx.
'A' (English), 'А' (Russian), and 'Α' (Greek) are all different characters.
Windows-1253 is the Greek character set.
So I'm curious how koi8-r or Windows-1251 matched Greek.
> This is way too restrictive anyway. Basically, you are dis-allowing any
> Cyrillic word -- including a person's name, just as a quick example.
>
> What would be needed is code to identify the non-western chars in
> *relation* to western chars. And a minimum limit before triggering, to
> avoid scoring a mail with a perfectly valid short English body, and a
> long-ish $foreign language signature.
Yeah, I figured that's where we'd end up. Any suggestions on specific
thresholds?
"there is already a test for the majority of characters in the body
being high-bit"
What test is that?
--
"A ship in a port is safe, but that's not what ships are built for."
-Grace Murray Hopper
http://www.ChaosReigns.com