I am trying to move this thread to email-...@python.org since the underlying issue is in the email package. Further, since as of Mailman 2.1.12, we no longer install a Mailman specific version of the email package, it really has to be addressed in the email package.
Stephen J. Turnbull wrote: > Mark Sapiro writes: > > > I think there is a minor bug in decode_header() in that it won't > > recognize a RFC 2047 encoded word in a comment if the encoded word is > > not separated by whitespace from the ")" that terminates the comment. > > However, this is the only place where an encoded word need not be > > followed by whitespace or the end of the header. > > Indeed that's a bug. I gather that you're saying that this bug is not > the cause of the OP's problem, though? Correct. > > The Subject: header above is non-compliant in two respects. It is too > > long. [...] However, decode_header will accept it anyway and do > > the right thing. > > As it should, according to the Postel Principle. Anyway, IIRC the > length limit is a SHOULD NOT, not a MUST NOT, right? The RFC (8|28|53)22 limits are MUST BE <= 998 and SHOULD BE <= 78. RFC 2047 seems to want to impose stricter limits on encoded words, but unfortunately does not use the defined terms MUST and SHOULD. Section 2 says in part: An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used. While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more 'encoded-word's is limited to 76 characters. so it is not clear whether these are 'recommendations' or 'requirements'. In any case, email.header.decode_header() is not enforcing any limits so we are being generous in what we accept in this respect. > > real problem is item (1) in section 5 of the RFC says in part: > > > > Ordinary ASCII text and 'encoded-word's may appear together in the > > same header field. However, an 'encoded-word' that appears in a > > header field defined as '*text' MUST be separated from any adjacent > > 'encoded-word' or 'text' by 'linear-white-space'. > > > > The header above does not comply with this. > > Agreed, but I think that by default[1] email should try to parse this > header as the user intended it. It's not like encoded-words are that > easy to confuse with intended text; it's unlikely that changing > 'linear-white-space' above to 'linear-white-space or specials' would > harm anyone. I fully agree. There is a regexp (ecre) in email/header.py that ends with the lookahead assertion "(?=[ \t]|$)". Even in "strict mode", I think the lookahead needs to accept ")" as well as space and tab, but I think by default, it should just be removed. > > This is a problem with the MUA (mail client) that encoded the Subject: > > header in the first place. > > Agreed, but I think following the Postel Principle here is likely to > do less harm than adhering strictly to the RFC. I agree here too, and note that some MUAs (all three I tried including mutt and Thunderbird) decode the original header as intended. > That said, I'm not in a position to contribute code, and this is a > pretty invasive change, so the user is unlikely to see a version of > Mailman that handles this any time soon. They are likely to have more > luck switching clients. > > Footnotes: > [1] Ie, there should be an option to be strict. > > -- Mark Sapiro <m...@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9