Re: [Email-SIG] fixing the current email module

Glenn Linderman Fri, 09 Oct 2009 20:46:43 -0700

On approximately 10/9/2009 6:25 PM, came the following characters fromthe keyboard of R. David Murray:

On Fri, 9 Oct 2009 at 17:54, Glenn Linderman wrote:
On approximately 10/9/2009 4:20 PM, came the following charactersfrom the keyboard of R. David Murray:
 On Fri, 9 Oct 2009 at 13:26, Glenn Linderman wrote:
> On approximately 10/9/2009 8:10 AM, came the following charactersfrom > the keyboard of Stephen J. Turnbull:
> >   Glenn Linderman writes:
> > > > > produce a defect report, but then simply converted toUnicode > > as if > > > it were Latin-1 (since there is no otherknowledge > > available that > > > could produce a betterconversion).> > > > > > No, that is already corruption. Most clients willassume > > that string
> > > >   is valid as a header, because it's valid as a string.
> > > > Sure it is corruption. That's why there is a defectreport. But> > > the conversion technique is appropriate, per the Postelprinciple.> > > > Actually, I would say you are emitting leniently, inviolation of the> > Postel principle. > > You can say that, but I don't have tobelieve it. I'm talking about > accepting; the message hasarrived, it is here, the client is trying to > look at it, and I'mtalking about ways the client can look at > not-quite-perfect data,knowing that it is not quite perfect, but still > being able to seeit. I'm not at all talking about emitting data. You > seem to becalling the email package helping the client to accept >not-quite-perfect data, as a form of emitting data. It is not.
 IMO, the appropriate way for the email package to provide the API you
are talking about is it provide the client with a way to get at theraw
 byte string, which I think everyone agrees on.  If the client wants to
decode it as if it were latin-1 to process it, it can then do that.
That certainly works, but it isn't very helpful... that forces theclient application to reproduce the logic to parse the header valueand decode the parts that can be decoded successfully, and that isexactly the sort of thing Stephen was complaining about when hethought I was suggesting that to be a requirement (but he wasconfused about what I was suggesting).
I wasn't clear, sorry :). The current API has a 'decode_header'function,
which doesn't do the byte-to-unicode decode (yeah, there's another naming
problem here...we have two types of decoding and only one word for both)
but instead returns (bytes, charset) tuples.  This piece of the API is
broken in python3, and I don't think it is the right API going forward,
but that _kind_ of API is what I meant by 'getting at the raw byte
string':  the byte string that failed the bytes-to-unicode decoding,
not the entire header (though there will also be a way to get that if
you need it, I presume.)

Yeah, that'd be better.Of course, when returning Unicode strings, there would be no particularneed to identify the various charsets in which the header wastransmitted, except for invertibility and error handling, unless theclient wanted to track that for some reason.If the goal is to preserve invertibility, then maybe tuples like (str,charset, defect) would be better.... where defect would be None for gooddata, but if defect were "non-ASCII", then you'd know the str wasconverted as if it were charset [Latin-1 in my book, but if emailpackage had rules or the API had parameters for how to deal withnon-ASCII stuff, some other charset could be specified, perhaps, but ifthat fails it might still have to fall back to Latin-1]; if defect were"ASCII", then you'd know that the str looked like an encoded word, butcouldn't be decoded because the charset wasn't recognized, or thedecoding via that charset failed, so the encoded word was supplied.

Correspondingly, a header value could be set by supplying such a list,even with defect values as described above, to permit invertibility, andpassing on what was obtained, so that if there are overriding localconventions (yep, such things used to be used, and maybe still are insome areas), that the data would be preserved as best as possible, andso that the email package could support creation of messages accordingto the local conventions.

I'd hope that a separate tuple would be used for each encoded-word, or,if charset ASCII and defect None, then it would describe a run of ASCIIbetween encoded words. Yes, an encoded word can be encoded in ASCII forrare use (if the input word looks like an encoded word), so that wouldcause a sequence of charset ASCII, defect None tuples, but otherwise aplain ASCII header value would have a single entry in the list of tuples.


--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

_______________________________________________
Email-SIG mailing list
[email protected]
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

Re: [Email-SIG] fixing the current email module

Reply via email to