https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8272
--- Comment #6 from Sidney Markowitz <sid...@sidney.com> --- There is a further problem revealed by this test case after I strip out the http://user:pass@ prefix. The text/html base64 section is declared to have UTF-8 charset. When the base64-decoded result is then charset-decoded in Node.pm _normalize(), that fails because there is a byte that is invalid UTF-8. dbg: message: failed decoding as charset UTF-8, declared UTF-8 (UTF-8 "\\xB1" does not map to Unicode) dbg: message: decoded as last-resort charset Windows-1252, declared UTF-8 When the string is decoded as charset Windows-1252 all the multi-byte UTF-8 characters that are in the URL turn into a mess of various one-byte non-ASCII characters, destroying the URL parsing. The smallest change I can think of to fix this is in the code that now says elsif ($tried_utf8 && $chset eq 'UTF-8') { # was already tried initially, no point doing again } I propose changing it to try decoding with UTF-8 again, but not strict, i.e., without the FB_CROAK flag, so it can succeed even when there are some non-UTF-8 bytes that will be mis-decoded. What do people think? Any alternative suggestions? -- You are receiving this mail because: You are the assignee for the bug.