Re: [Email-SIG] API thoughts

Glenn Linderman Tue, 01 Mar 2011 13:59:04 -0800

On 3/1/2011 12:40 PM, R. David Murray wrote:

This is a long email, for which my apologies.  I hope you all will
manage to find some time to read it and provide feedback, as it speaks
to fundamental design issues.


Indeed.  Good to discuss before designing with ready-mix.

Everything else is an implementation detail :)


Agreed.

We propose to create a new API to make all of this easier for
the application programmer.


YES!!

[*] There are current real-world use cases for this:  there are nntp
     servers that use utf-8 for headers, and the http protocol uses
     latin-1 (or sometimes, I think, utf-8)

All the tunables listed are relevant. The HTTP protocol standard claimsto use Latin-1 + RFC 2047 encoding for non-Latin-1 characters; inpractice, the browser implementations apparently use nearly _any_encoding for headers!!! For <form> responses, when there is actuallyuser-specified data involved, they use the encoding defined for the pagecontaining the form, as the encoding of the MIME headers sent back. The"standard headers" seem to be ASCII, and somewhat immune to choice ofencoding, except perhaps for those few encodings that are not ASCIIsupersets. (I have no clue how such are handled, if they are. Anyonewant to write an EBCDIC page containing a <form> for testing?)

This is useful, as it reduces the amount of character escaping likely tobe required, the designer of the page chooses a character set that canrepresent the page, and is likely in the language of the intendedrecipient, who is likely to fill out the form using the same language.

It would be more useful, if the browsers included a(n ASCII) header thatspecified the encoding of subsequent headers: they do not. Therefore,the server that receives the headers must somehow "know" the properencoding. For the situation where the CGI (or equivalent) script bothgenerates the page containing the <form> and receives the form data,this is simple. For the situation where the same web applicationdesigner creates the page containing the <form> and the CGI receivingthe form data, and explicitly or implicitly declares the same encodingfor both, this is functional, but there is the danger of someonechanging the static pages to conform to a new standard encoding withoutrealizing the consequences on the associated CGI scripts. It is alsorather hard to create "form filling" applications that can send formdata to a server bypassing the access of the form itself... suchapplications must also "know" the proper encoding, and such applicationsare much more likely to be generated outside the realm of the originaldevelopment environment, and much less likely to be involved in anyplanning to change encodings inside the application <form>s and CGIs.

To support reading byte-stream HTTP headers, therefore, it is criticalthat the email API accept an encoding from the application which "knows"the encoding; presently cgi.py has to pre-decode incoming headersbecause email does not have such a parameter. On the other hand, maybecgi.py shouldn't use email header parsing at all... since browsers don'tuse RFC 2047 encoding in practice, the parsing of headers without suchis straightforward.

Further, HTTP data streams can be extremely large, and thustime-consuming to obtain over the wire. CGI applications cannot affordto keep large blocks of data in RAM during receipt, thus if email wishesto support CGI, it needs features for placing large blocks of data ondisk instead of in RAM during the parsing phase; cgi.py presently has topreparse headers, to separate them from the data streams, which it thenhandles on its own, because of this issue.

Hence, cgi.py does sufficient preparsing and private handling of HTTPdata streams, that it seems that the only real benefit it gains fromusing email at all, is the handling of the complex RFC 2047 decoding...which in practice isn't used in HTTP data streams!

In any case, if email wants to promulgate itself as the "one true way"to process HTTP data streams, as well as SMTP and NNTP data streams,then it needs to address the issues above.

There is, by the way, room for improvement in the cgi.py handler forHTTP data streams; presently all large MIME objects are written to disk(but small ones are kept as string or byte streams), but it isn'tnecessarily the right disk, and the data must then be again copied, byteby byte, to its final file system location. I see that as abhorrentoverhead. There is presently no provision for hooks that ask the CGIapplication what to do with the data being received, while it is beingreceived, nor for policies to assist with better heuristics, with thegoal in mind that a properly and completely received MIME object couldthen be renamed to its final location rather than copied.

I guess I'm proposing, then, that there be an API version definition,
with two values as of Python3.3: email5 API, and email6 API.  We'll
figure out how we name and interrogate these formally later.

Question: While it is pretty clear that enhanced behaviors are requiredto benefit new applications that use email, and while some new APIs maybe incompatible with some existing APIs, might it be possible to designthe new API, and then build a compatibility layer that looks like theold API on top? Such that there would be policies for the new APIs thatwould work like the old APIs to ease the implementation of such alayer? I'm not sure I fully understand the use of _factory or factoryparameters, but for APIs that have _factory and grow a factory, couldnot the presence of which parameter imply any variant functionality?

(OK, this question comes after not looking at the email API during allthe GSOC and your implementation efforts since the last big round ofdiscussion, but your proposals here seem to sound like it would be morepossible with your current thinking that with your previous thinking.)

The Header registry in this vision is accessed through the Message class.
I have various thoughts about how this will work, but I'm going to leave
those for later, since this email is already long enough.  I also have
some additional thoughts about backward compatibility, but it is going
to require some experimentation to see if they are realistic.

Consider me an interested observer; I'll enjoy reading, thinking, andcommenting about these ideas too, but sadly am unlikely to implement anemail client this year :( But I have aspirations to do so, because noneof the existing email clients exactly suit my preferences... (everyoneshould write an editor and an email client, no? I've done the formerseveral times... what I want, though, is emacs-python, instead ofemacs-lisp).


Glenn

_______________________________________________
Email-SIG mailing list
Email-SIG@python.org
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

Re: [Email-SIG] API thoughts

Reply via email to