https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229
--- Comment #9 from Henrik Krohns <[email protected]> 2011-05-06 07:00:21 UTC --- (In reply to comment #8) > > Too much technical debate for 3.3.2 consideration. Retargeting to 3.4.0. > > How about just doing a plain lc for now, which will at least > handle all-ascii text such as English: > > - $word = "\000" . $word . "\000"; > + $word = "\000" . lc($word) . "\000"; > > and leave the bug open for a better solution in 3.4 ? I'm currently trying a "proper" set of characters.. imo lc is too vague and locale dependent. $word =~ tr/A-Z\xc0-\xd6\xd8-\xde/a-z\xe0-\xf6\xf8-\xfe/ if $word =~ /[A-Z]/ && $word =~ /[a-zA-Z\xc0-\xd6\xd8-\xde\xe0-\xf6\xf8-\xfe]{4}/; This table includes all latin accents. foreach (192..214, 216..222) { printf "%s %x %s %s %x %s\n", $_, $_, chr($_), $_ + 32, $_ + 32, chr($_ + 32); } Also I'm quite certain that lowering textcat_acceptable_score to 1.02 is also the right thing to do. I'm currently making a small corpus of different languages, including a separate fp corpus. I'll have some results soon.. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
