https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253

            Bug ID: 7253
           Summary: X-Spam-Report incorrectly mime-encodes multiline
                    report in header, violating RFC 2047
           Product: Spamassassin
           Version: 3.4.1
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Libraries
          Assignee: [email protected]
          Reporter: [email protected]

With
  report_safe 0

and a rule with non-ASCII description, e.g.:

  header L_TEST_REPORT_ENCODING From =~ /./
  score  L_TEST_REPORT_ENCODING 0.01
  describe L_TEST_REPORT_ENCODING  En-tête contient caractères

the resulting X-Spam-Report multiline header field as inserted
into spam messages is incorrectly encoded into encoded-words:
the whole multiline header field is encoded into a single
encoded-words, whitespace is not encoded, the result contains
whitespace within encoded-word, and the encoded-word spans across
lines:

X-Spam-Report: =?UTF-8?Q?
  *  100 USER_IN_BLACKLIST From: address is in the user's black-list
  *  0.0 L_TEST_REPORT_ENCODING En-t=c3=aate contient caract=c3=a8res
  * -0.3 BAYES_05 BODY: Bayes spam probability is 1 to 5%
  *      [score: 0.0137]
  *  0.3 TXREP TXREP: Score normalizing based on sender's reputation?=

This is wrong on multiple accounts. The RFC 2047 is explicit:


   An 'encoded-word' may not be more than 75 characters long, including
   'charset', 'encoding', 'encoded-text', and delimiters.

[...]

   IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
   by an RFC 822 parser.  As a consequence, unencoded white space
   characters (such as SPACE and HTAB) are FORBIDDEN within an
   'encoded-word'.  For example, the character sequence
      =?iso-8859-1?q?this is some text?=
   would be parsed as four 'atom's, rather than as a single 'atom' (by
   an RFC 822 parser) or 'encoded-word' (by a parser which understands
   'encoded-words').  The correct way to encode the string "this is some
   text" is to encode the SPACE characters as well, e.g.
      =?iso-8859-1?q?this=20is=20some=20text?=

[...]

   Only a subset of the printable ASCII characters may be used in
   'encoded-text'.  Space and tab characters are not allowed, so that
   the beginning and end of an 'encoded-word' are obvious.


The culprit is  MS::PerMsgStatus::qp_encode_header().
It should encode (when necessary) each line individually,
and should encode whitespace within encoded-word(s).

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to