https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7130

Mark Martinec <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #3 from Mark Martinec <[email protected]> ---
In summary: improved tokenization of UTF-8 -encoded text (natively or
due to normalize-charset) at some processing expense, which is relatively
minor in the overall bayes tokenization CPU usage.  Closing.

(If some time in the future we decide to switch internal text representation
to Unicode (utf8 flag on), then these 'manual' dealing with UTF-8 encoding
bytes will go away)

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to