On Oct 10, 2009, at 9:59 AM, Stephen J. Turnbull wrote:
Both. I *believe* (but it needs to be checked) that in a correctly formed multipart MIME object (message or part), any internal structure is context-free within the MIME boundaries. If that is so, then individual parts of the object can be stored in raw form and parsed lazily.
I too /think/ that's correct. There are some MIME content-types that cause parts to be related (e.g. multipart/alternative and multipart/ related), but those are all operating at a higher level.
In practice it probably makes sense to parse all the headers right away. Content-Type has the most bearing on parsing the rest of the stuff, so by that time you already need to parse parameters to e.g. get the boundary. Early on I claimed that headers were so manageable in practice that we could implement an ordered-dictionary with duplicates as a simple list, with linear searching and nobody would notice. I think nobody has noticed ;).
Lazy parsing of the body does make sense. You only need to parse enough to find end boundaries, or recurse into parsing an embedded part. This is how the parser currently works anyway.
-Barry
PGP.sig
Description: This is a digitally signed message part
_______________________________________________ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com