On 19Nov2016 19:58, Kevin J. McCarthy <ke...@8t8.us> wrote:
On Sun, Nov 20, 2016 at 10:08:13AM +1100, Cameron Simpson wrote:
On 19Nov2016 13:13, Kevin J. McCarthy <ke...@8t8.us> wrote:
> Should mutt be rfc-2047 encoding/decoding the references
> header?

No. RFC2047 tokens need to be whitespace delimited from the surrounding
text.  No whitespace is permitted inside the "<" and ">" markers which
enclose a message-id:

Thank you for your detailed analysis, Cameron.  I will take a deeper
look at this soon.  Another piece of information is that they sent a
reply through the Fastmail web interface, which sent this:

 References: =?utf-8?Q?=3C201611170549=2EQ3WT?=
   =?utf-8?Q?foMB=C3=83=C2=BEngguang=2Ewu=40i?=
   =?utf-8?Q?ntel=2Ecom=3E=20=3C1479410?= =?utf-8?Q?777-6702-1-git-sen?=
   =?utf-8?Q?d-email-manuel=2Esch?= =?utf-8?Q?oelling=40gmx=2Ede=3E?=

 <https://gist.github.com/andrey-utkin/c9cf4f2dc282cf257a2552b06ede49d5>

If this is legal, then mutt needs to be decoding the References before
trying to parse out the ids, because I believe it will just choke on
this.

Wow. I would have thought that was illegal.

Regarding the discussion below, the TL;DR is that I think that if it is feasible mutt should decode these, but write _unencoded_ versions of these headers and any headers derived from them. In particular, is it easy to make mutt's header ingestion code go "stict parse, but if that fails decode with RFC2047 and try a second time"? Probably on a specific header basis.

Regarding the standards:

RFC2047 doesn't actually enumerate specific headers, but second 5 has a list of permitted and forbidden places for "encoded-words" (which the above are). I'm going to quote the bits I think are pertinent but please read it to see if I'm missing anything:

An 'encoded-word' may appear in a message header or body part header according to the following rules:

(1) An 'encoded-word' may replace a 'text' token (as defined by RFC 822) in any Subject or Comments header field, any extension message header field, or any MIME body part field for which the field body is defined as '*text'. An 'encoded-word' may also appear in any user-defined ("X-") message or body part header field.

Message-IDs are not "text" in RFC822 and its modern form RFC5322. So I'd say (1) does not permit this. A 'text' token is defined as:

  text            =   %d1-9 /            ; Characters excluding CR
                      %d11 /             ;  and LF
                      %d12 /
                      %d14-127

(1) _does_ say "any MIME body part field for which the field body is defined as '*text'". But '*text' means zero or more 'text' tokens, and Message-ID: et al are not MIME fields.

(2) An 'encoded-word' may appear within a 'comment' delimited by "(" and ")", i.e., wherever a 'ctext' is allowed. More precisely, the RFC 822 ABNF definition for 'comment' is amended as follows:

 comment = "(" *(ctext / quoted-pair / comment / encoded-word) ")"

This doesn't cover Message-IDs.

(3) As a replacement for a 'word' entity within a 'phrase', for example, one that precedes an address in a From, To, or Cc header. The ABNF definition for 'phrase' from RFC 822 thus becomes:

 phrase = 1*( encoded-word / word )

But a 'phrase' is just one of more 'word's and Message-IDs are not 'word's. The RFC2047 goes on to say that _any_ other use is forbidden, and tries to be really clear about that:

These are the ONLY locations where an 'encoded-word' may appear. In particular:

+ An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'.

+ An 'encoded-word' MUST NOT appear within a 'quoted-string'.

+ An 'encoded-word' MUST NOT be used in a Received header field.

+ An 'encoded-word' MUST NOT be used in parameter of a MIME Content-Type or Content-Disposition field, or in any structured field body except within a 'comment' or 'phrase'.

So I think fastmail are playing fast and loose, and while mutt should try to cope, it sure as hell should never _emit_ this nonsense!

Cheers,
Cameron Simpson <c...@zip.com.au>

Reply via email to