I'd like to try to summarize what I understand Barry to be saying (which, in this case, also reflects my understanding of what is needed), and see if I'm anywhere close to on target :) In the following discussion, 'text' refers to unicode data, and bytes refers to, well, bytes. (I chose to use 'text' instead of 'string' to avoid confusion).
The email package consists of two major conceptual pieces: the API, and the internal data model. The API needs to have facilities for accepting data in either text format or bytes format, and this data is used to generate a model of the input message (a Message). Likewise the API needs to provide facilities for serializing a Message as either bytes or text. The API also provides ways to build up a Message from pieces, or to extract information from a Message in pieces, and to modify a Message, and again input and output as both text and bytes must be supported. The data model used by the email package is an "implementation detail", and we should not spend effort at this stage trying to optimize it for anything except memory requirements with respect to potentially large sub-objects, and even there it is more a matter of providing ways to deal with potentially large sub-objects than it is a true optimization. In general correctness and robustness is much more important than speed. The data model will need to be a practical hybrid of the input data, possibly transformed in some way in some cases, and various sorts of meta-data. The current email package already works this way. An important characteristic of the model is that it be idempotent whenever sensible; that is, if a given byte stream is used to create a Message or subobject, serializing that Message or subobject as bytes should return the original byte stream whenever sensible (ie: when the data is not pathologically malformed). Likewise if a text stream is used to create a Message or subobject, serializing it as text should produce, whenever sensible, the original text stream. In particular, well-formed (per RFC) message data should always be stored and produced idempotently. An important property of the API is that both the parser that transforms an input stream into a Message and Message serialization should not raise exceptions except in the face of errors that leave no way to produce a valid Message or serialization. Instead a defects list is maintained and exposed through the API. In the face of some defects it may not be sensible to maintain idempotency. The APIs that manipulate the data model either for piecewise construction or for transformations may raise exceptions, and in most cases _should_ raise exceptions when encountering invalid data or operations. Also, as an additional note to those thinking about use cases, I'd like to point out something I know well and which Barry reminded me about recently: parts of the email package (eg: MIME and RFC822-style header parsing) are used or can be used by systems other than systems handling email. The particular cases I have run into myself are working with non-email data files that follow RFC822 rules, and handling data from NNTP (which, granted, is almost email...but only almost). In the former case you usually have text input and output, mediated by the encoding of the file(s) on disk. In the latter case you have all the problems of email plus a few more. Further, in the standard library the http package, urllib, the cgi module, and pydoc are all clients of the email package. --David (RDM) _______________________________________________ Email-SIG mailing list [email protected] Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
