Re: [Bug 4078] ok_locales not working on windows-* charsets

darxus Thu, 27 Oct 2011 14:24:12 -0700

On 10/27, Karsten Bräckelmann wrote:
> > header RUSSIAN_SUBJECT Subject =~ /(АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ){2}/i
> 
> How would that be different from "write in Greek occasionally"?


I poorly guessed what exactly that looked like.  Although, comparing the
Russian and Greek alphabets on Wikipedia now, they have entirely separate
ranges of characters.  Russian is U+04xx and Greek is U+03xx.
'A' (English), 'А' (Russian), and 'Α' (Greek) are all different characters.  
Windows-1253 is the Greek character set.  

So I'm curious how koi8-r or Windows-1251 matched Greek.  

> This is way too restrictive anyway. Basically, you are dis-allowing any
> Cyrillic word -- including a person's name, just as a quick example.
> 
> What would be needed is code to identify the non-western chars in
> *relation* to western chars. And a minimum limit before triggering, to
> avoid scoring a mail with a perfectly valid short English body, and a
> long-ish $foreign language signature.

Yeah, I figured that's where we'd end up.  Any suggestions on specific
thresholds?


"there is already a test for the majority of characters in the body
being high-bit"  

What test is that?

-- 
"A ship in a port is safe, but that's not what ships are built for."
-Grace Murray Hopper
http://www.ChaosReigns.com

Re: [Bug 4078] ok_locales not working on windows-* charsets

Reply via email to