https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7130
Mark Martinec <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Mark Martinec <[email protected]> --- In summary: improved tokenization of UTF-8 -encoded text (natively or due to normalize-charset) at some processing expense, which is relatively minor in the overall bayes tokenization CPU usage. Closing. (If some time in the future we decide to switch internal text representation to Unicode (utf8 flag on), then these 'manual' dealing with UTF-8 encoding bytes will go away) -- You are receiving this mail because: You are the assignee for the bug.
