https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7520
--- Comment #5 from Giovanni Bechis <[email protected]> --- Created attachment 5500 --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5500&action=edit fix when utf8_mode does not work If utf8_mode does not work like in this case, we should decode data in utf8. >From HTML::Parser man page: --------------------------------------------------------------------- Parsing of undecoded UTF-8 will give garbage when decoding entities (W) The first chunk parsed appears to contain undecoded UTF-8 and one or more argspecs that decode entities are used for the callback handlers. The result of decoding will be a mix of encoded and decoded characters for any entities that expand to characters with code above 127. This is not a good thing. The recommended solution is to apply Encode::decode_utf8() on the data before feeding it to the $p->parse(). --------------------------------------------------------------------- -- You are receiving this mail because: You are the assignee for the bug.
