https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7520

--- Comment #5 from Giovanni Bechis <[email protected]> ---
Created attachment 5500
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5500&action=edit
fix when utf8_mode does not work

If utf8_mode does not work like in this case, we should decode data in utf8.
>From HTML::Parser man page:
---------------------------------------------------------------------
       Parsing of undecoded UTF-8 will give garbage when decoding entities
           (W) The first chunk parsed appears to contain undecoded UTF-8 and
           one or more argspecs that decode entities are used for the callback
           handlers.

           The result of decoding will be a mix of encoded and decoded
           characters for any entities that expand to characters with code
           above 127.  This is not a good thing.
           The recommended solution is to apply Encode::decode_utf8() on the
           data before feeding it to the $p->parse().
---------------------------------------------------------------------

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to