https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8272
--- Comment #9 from Sidney Markowitz <sid...@sidney.com> --- After looking at the comment thread in bug 7126 it seems the logic is pretty much as I described. The "last resort" is settling for garbage-in / garbage-out by decoding with can't-fail to produce something even if it is garbage Windows-1252. If the data has an explicitly declared charset of UTF-8, I think it is more likely that it really is UTF-8 with some small number of errors than that it really is Windows-1252. Decoding it as UTF-8 without fail on error would result in only the bad bytes (plus up to 3 more bytes per error byte) decoding as garbage. Decoding such a string as Windows-1252 would turn every multibyte character into garbage. So I propose that before getting to the "last resort" we add that if the charset is declared as UTF-8 we decode as UTF-8 without the FB_CROAK flag. I see from bug 7126 that at that time Mark Martinec had the most understanding of the issues and had run tests of the results of decoding in many mails. Mark, that's from 9 years ago, but do you by chance have any thoughts to weigh in on this? -- You are receiving this mail because: You are the assignee for the bug.