https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229
--- Comment #5 from Mark Martinec <[email protected]> 2011-05-04 19:56:33 UTC --- > If you mean.. > $word = lc($word) if $word =~ /[a-zA-ZöäåÖÄÅ]{4}/; Yes, something like that. > I'm fine with that if it handles the special chars? But isn't that locale > dependent? Doesn't seem to work for me. It is locale dependent I believe. Another case for a Bug 3062. It should be documented somewhere that SpamAssassin should be run under a C locale. > I guess more of the special chars would need to be handled in any case, > I just went with the finnish ones and it worked for me well.. I only tried it with my installed version of perl under a C locale, seems the lc() handles such characters well, but the string must be decoded first into a correct character set - does not work on raw octets, as it has not idea that these can be interpreted as ISO Latin1. Ok, so this is probably too much of a change for a minor release, backing off my suggestion. What will happen with 8-bit characters in the source code? So far there is no such case as far as I can tell. Maybe these should be encoded in the source as \ooo or \x{hh} to stay on the safe side (not depending on a locale). Other common uppercase letters with diacritics from Latin1 should be included in the set I suppose. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
