https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590

--- Comment #32 from Warren Togami <[email protected]> 2010-01-18 08:26:41 UTC 
---
(In reply to comment #28)
> Not surprisingly it affects Bayes, but only as slightly as the rules. Probably
> tokens containing highbits etc. It's simple to test with sa-learn and 
> comparing
> dumps.

I would imagine that treating the multi-byte characters as individual bytes
might bite us in ways similar to Bug 6183.  Various control characters or
characters considered non-words happen as the second byte, screwing up
tokenization.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to