Tokio Kikuchi writes: > Barry Warsaw wrote: > > > On Mar 27, 2007, at 3:06 AM, Tokio Kikuchi wrote: > > > >> In my opinion (may not be true to RFC2822 in detail), ascii strings in > >> header object should be strip()ped and separated by FWS (including > >> '\r\n ' or '\r\n\t'). > > > > I actually think we should be doing the opposite, namely preserving any > > FWS in the existing text and /not/ substituting continuation_ws for it > > when we re-break the headers. This is the only way to maintain > > idempotency short of saving the original header intact (but then memory > > usage doubles).
Idempotency is a test, not a requirement. The requirement is "first, do no harm". Ie, if you process the header, the result should be as much "like" the original as possible. This is not actually implementable (different people will have different opinions about what that means, except only *really different* people will have the opinion that idempotency is undesirable<wink>), but the email package should make it possible for people to get pretty close without rewriting the package. > > continuation_ws should be used only when we're forced > > to break at a non-existing FWS location, e.g. if we've split a non-ascii > > header or at a non-whitespace header-specific syntactic break. In the > > case of RFC 2047 headers, the FWS gets consumed anyway so it isn't > > idempotentially (?!) significant. Only in RFC 2047 conformant MUAs. IMHO, RFC 2047 conformance is a requirement, but it's not sufficient. There are too many MUAs out that that do not correctly handle headers folded between encoded words (eg, Kyle Jones's VM). I don't know if you *should* care, but I think that RFC 2047 is (unfortunately) insufficient grounds for refusing to care at this stage. AFAICS the implication is that you need to make a judicious choice of the default for continuation_ws. > Well, this will surely break my contribution on Mailman 2.2 > CookHeaders.py where unifying the code for subject prefix munging for > both ascii and rfc2047. :-( I don't see why it should, although there might be technical reasons why it would. What I want, and what I think Barry is proposing, is simply that the email package never does anything to disturb FWS by default. If you munge a header (even as trivially as removing a "Re:" prefix), you must accept responsibility for formatting the result. At that point, I see no reason why the email package shouldn't help you "reflow" a header if that's desirable in your application---but the application should have to request that explicitly. It shouldn't be implicit in the setting of continuation_ws. > May be we should add a option for email.header.Header(), like > idempotent=Ture/False. ;-) I think it would be better to add an option, or even a hook function, for formatting. For example, I often use a docstring-like convention for long subject headers, where the gist is in the first line, and the rest is formatted nicely (ie, indented to align with the initial character of the first line of the subject). It would be nice if that kind of thing could be done with an application-supplied function (of course email could provide a number of common ones itself). _______________________________________________ Email-SIG mailing list [email protected] Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
