https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229

--- Comment #14 from Henrik Krohns <[email protected]> 2011-05-06 20:36:50 UTC ---
(In reply to comment #11)
> Henrik, did you try lc() after setting the utf8 flag?
> 
> $word = Encode::decode_utf8($word); # set the flag

I think that's trying to be too clever.. I believe the textcat database has
some utf-8 signatures also.

About the attached files:

- I used acceptable score 1.02 for both, since it provides more accurate
results
- Full tr/A-Z\xc0-\xd6\xd8-\xde/a-z\xe0-\xf6\xf8-\xfe/ is used in the new

I think this is safe tuning, reducing many of the "bunch of languages" and "all
caps look like japanese".

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to