https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7249

Mark Martinec <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |4.0.0

--- Comment #7 from Mark Martinec <[email protected]> ---
> These words are also missing from Bayes tokens, although the code
> path there is different: header decoding goes through decoding in
> Message::Node::_decode_header, which intentionally avoids decoding
> MIME-words in Content-* header fields. The reason is probably in
> RFC 2047, which explicitly excluded the use of MIME-words there,
> although a later RFC 2184 introduced such encodings.
> 
> Will see what can be done with __decode_header() and _normalize()
> to get such names decoded.

Enhanced _decode_header() and _normalize(), committed below.

> Interestingly some time in the far past it seems to have been decided
> that Encode::decode("MIME-Header",...) may not be the best choice,
> but have implemented own decoding (Mail::SpamAssassin::Util::qp_decode,
> Mail::SpamAssassin::Util::base64_decode, __decode_header). Not sure
> what was the rationale, possibly some bug in the Encode::MIME::Header
> back then. Seems suboptional now to use two different decoding
> implementations for decoding of the same header field in two places.

Checked the changelog of Encode::MIME::Header. Seems that most of the
problematic cases were fixed with a version of Encode that came with
perl 5.10.1, although there are still some unresolved issues, like
incorrectly discarding whitespace on header folding. Our code does it
better, especially with the new code just committed.

> I made some modifications to my copy of Message::Node.pm to better
> deal with it: just mangle the split character instead of giving up
> on UTF-8 decoding entirely and falling back to Windows 1250, which
> yields true gibberish. Needs some more testing.

trunk:
  _decode_header:
    deal with invalid splicing of multibyte characters in encoded-words,
    allow language info in encoded words (RFC 2231),
    decode Content-* for benefit of Bayes;
  renamed __decode_header to _decode_mime_encoded_word
Sending lib/Mail/SpamAssassin/Message/Node.pm
Committed revision 1707593.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to