Re: [Email-SIG] fixing the current email module

Glenn Linderman Thu, 08 Oct 2009 14:59:53 -0700

On approximately 10/8/2009 6:00 AM, came the following characters fromthe keyboard of Barry Warsaw:

On Oct 8, 2009, at 3:29 AM, Glenn Linderman wrote:
The application options are to drop the attachment, or pass throughthe corrupted bytes, and let the next application try to make senseof it.
Exactly, and it's not for the email package to say which is right.
Here's a use case: I've got a Message that was parsed from wire inputand I want to mangle the Subject heading to add the list prefix. Iknow exactly what charset the prefix is in because that's data Icontrol. When I ask for the original Subject value, I'm handed aninstance that I can use to try to figure out how add the prefix.
First thing I'll ask it is "are you a single chunk in my prefixcharset (or compatible)?" If so, I can probably just prepend myprefix onto the value. If not, "are you composed of multiple validchunks in different charsets?" If so, I know that I need to encode myprefix, but I can still prepend it to the header value (hopefullyusing the same API, and I don't care that the implementation could notuse string concatenation).
If not, then what? Maybe I don't care if some of the chunk charsetsaren't known because I can still use the right encode+prependstrategy. But if the header is a gobbledegook of 8-bit bytes? I'mpretty sure I want to be able to ask the API if that's the case ratherthan get an exception. The thing I'm not so sure about is whathappens if my application is just naive enough to just ask for theheader as a unicode and that conversion can't be made. I /think/ itshould raise an exception in that case. But then when I ask for theheader value as a mass of bytes, that should succeed and return me theraw input.

So for this use case, it is known that all headers are ASCII. So theoperation of prepending a list prefix should not care whether theSubject: value is valid or not... it can simply prepend the list prefix,followed by SP, to the existing, raw header that already exists.

The only remaining issue is line length limits, so maybe it has to useCR LF TAB instead of space, sometimes.

OK, so if the prefix is not ASCII, it gets separately encoded, includinga trailing SP, and then prepended to the value followed by SP or CR LFTAB depending on the line length limit.

So to prepend into a text header, you shouldn't need to decode theundecodable... there should be a prepend (and possibly also an append)operation provided by the API, so that applications can tweak headerswithout decoding. This allows useful behavior even if new methods ofencoding are invented that are not yet understood by a particularversion of the email library.

Asking for the header value (or whole header) in Unicode should decodethe chunks that are understandable and decodable, and leave the chunksthat are not understandable asASCII-converted-to-Unicode-but-still-possibly-weirdly-encoded ... Ithink that is what the RFCs encourage.

Asking for a header as bytes should return the wire data, if it isavailable, or an encoding of real data as wire data (like generate woulddo). There is no Unicode that cannot be encoded to wire format, IIUC,usually via a variety of heuristics once non-ASCII characters areincluded, that may produce a variety of differing results, all of whichshould decode back to the original data.


--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

_______________________________________________
Email-SIG mailing list
[email protected]
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

Re: [Email-SIG] fixing the current email module

Reply via email to