Re: [Email-SIG] API for Header objects [was: Dropping bytes "support" in json]

Glenn Linderman Thu, 16 Apr 2009 13:44:23 -0700

On approximately 4/16/2009 6:02 AM, came the following characters fromthe keyboard of Steven D'Aprano:

On Thu, 16 Apr 2009 10:39:52 am Tony Nelson wrote:
I don't want there to be any "str(msg['tag'])" or "bytes(msg['tag'])"
at all, so there would be no loss of consistency.
That's ... different.
If the data for a header field is not properly a string,
But it always is.Even badly formatted emails with corrupt headers containing binarycharacters are strings -- they're just byte (non-Unicode) stringscontaining binary characters. Your mail server might not accept it aspart of a valid header, but it's a valid byte string.

Wire format email headers are composed of a subset of ASCII text. Thereshould be a way to obtain them, either as bytes, or via the trivial strconversion of those bytes to Unicode. Even corrupt headers containingbinary characters should be obtainable that way. There are no headerencoding or decoding algorithms that cannot be reworked to functionproperly on either the raw_bytes or raw_str version of a header, sincethe numeric values and sequence of all binary octets would be preservedvia both raw_bytes and raw_str. *The key is to know what is in hand.*For both raw_bytes and raw_str, all characters would be in the range 0 -0xFF. This is simple transliteration, not interpretation or parsing. Anon-corrupt header would have a smaller range, 0x20 - 0x7F. Any headershould be obtainable or settable in this form, using either bytes or strparameters/results. Yes, it should be possible to create corruptheaders in this manner. Useful mostly for testing, or for idempotency(which I also call GIGO).

However, obtaining headers in that way should be "hard", but only thesense of having to type more because it is part of a lower levelinterface, not the primary APIs... like msg['tag'].raw_bytes ormsg['tag'].raw_str... because it is actually the easiest way(implementation-wise) to obtain a copy of the data... but that copy maynot be as useful as one might like.

str(msg['tag']) or msg['tag'].str (or some such spelling[s]) shouldalways produce a displayable form of the header. If it is a known,standardized header that may contained data that was encoded fortransmission, such encodings should be reversed, and Unicode charactersoutside the range of U+0020 - U+007F may be included. Remember the goalhere is "displayable". So if the encoding is bad for a standard header,or a standard header is corrupt, or a non-standard header contains whatis apparently binary gibberish, and non-displayable Unicode controlcharacters are generated, they should be escaped as 7 ASCII charactersrepresenting a Unicode code point "\U+0017". All such display stringsmust always have "\" converted to "\\" so that there is no ambiguitywhen interpreting strings that may contain text that looks like one ofthe escape strings.

Known standard headers should have additional APIs (these already existfor the most useful ones) to obtain the interesting subcomponents(encodings, names, addresses, MIME types, etc.). These should have strparameters and results interfaces only, and specification of an encodingcan be optional, defaulting to UTF-8 (or possibly defaulting to aMessage-level encoding specification, which in turn may default toUTF-8), overridable in some of the APIs via optional parameters (some,because overloaded assignment APIs may not have room for such overrides,not having optional parameters).


--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

_______________________________________________
Email-SIG mailing list
Email-SIG@python.org
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

Re: [Email-SIG] API for Header objects [was: Dropping bytes "support" in json]

Reply via email to