https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8272

--- Comment #8 from Sidney Markowitz <sid...@sidney.com> ---
(In reply to Bill Cole from comment #7)

I'm on my phone and can't easily look up any discussion history, but I guess
the logic goes this way: If it all decodes as UTF-8 with not even one error, it
is likely to be plain ASCII or UTF-8 no matter what the charset is listed as in
the email, and the decode will be correct. If it is not decodable as strict
UTF-8 then the next most reasonable choice is the charset that is specified.
After that is a charset that is detected using the encoding detector. It is
only if all of those fail that a last resort is tried. The current last resort
is the Windows-1252 charset. I think that's because it never fails to decide to
something even if it is nonsense characters. But UTF-8 with no fail on errors
should work as a last resort, and even makes sense to use when charset UTF-8
has been specified.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to