Tokio Kikuchi writes: > Barry Warsaw wrote:
> > To me, this is a flaw in the rfc because there's no way to /avoid/ > > whitespace between unencoded and encoded parts! > > Well, it looks to me that RFC2047 prohibits [deleting whitespace] at > least in header text. That's my understanding as well, for the reasons Tokio gave. If you want "unicode(h)" ==> "helloworld", you need to encode the whole string. (Giving 'hello' the charset 'utf-8' would be a hackish way of doing this.) > The problem in email.header module is it can not distiguish between > the structured and unstructured (text only) headers. Yikes! I didn't think of it that way before, but now that you mention it, my spine is freezing. > The Header class may have a member function like 'add_comment', > IMHO. IMHO, the Header class should be abstract, and there should be subclasses that handle dates, lists of addresses, lists of message-ids, etc. as appropriate to header fields structured in each particular way. Only those object handlers appropriate to a given field would be exposed. StarTextHeader would the unstructured derivative of the (implicitly structured) Header class. Barry again: > > I really have no way of knowing what the intention of the rfc is here, > > so perhaps we need a flag on the Header class (or in the .append() > > method) to specify which interpretation the user wants. I really don't think that users should be allowed to "specify interpretation". RFC 2047 is a "transfer encoding". Users should never need to deal with that kind of thing, and it is dangerous to allow them to do so. Users (including software clients of the package, of course) should simply hand objects and text to the Header class to format according to 2822, 2047, and the definition of each field's structure. Nevertheless, given that RFC 2047 (and 2822, for that matter) is explicitly intended to allow headers to be human-readably formatted but still machine-parsable, the user should be allowed to express *preferences* for the formatting, for example qp_preference_function would be a function of the header contents such that if it returns true, QP encoding should be used, otherwise BASE64. But the decision to use encoded words would not be a user choice. There might be a preserve_whitespace_literally preference, in which case the whole header would have to be RFC 2047 encoded -- but in the case of structured headers (eg, address lists), you can't simply BASE64 the whole thing, only the *text components! And the email package needs to be free to deal with structured headers appropriately (for example, breaking very long addresses to try to keep line-length to a reasonable level). It may not be feasible to respect user preferences in all cases. Maybe there could be an escape to allow Sufficiently Smart Users to format headers "by hand", but its use should be discouraged in favor of a structured Header subclass that DTRTs. > email.header module is not a word processor. Good slogan! _______________________________________________ Email-SIG mailing list [email protected] Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
