Re: [Email-SIG] fixing the current email module

Glenn Linderman Thu, 08 Oct 2009 23:26:54 -0700

On approximately 10/8/2009 9:27 PM, came the following characters fromthe keyboard of Stephen J. Turnbull:

Glenn Linderman writes:

 > > Conversions will eventually be done.  "Best it were done quickly."
>> Disagree. Deferring the conversions defers failure issues to the point> where the code (hopefully) somewhat understands the type of data being> manipulated, and can then handle it appropriately. Converting up front> causes errors in things that may never be touched or needed, so the> error detection and handling is wasteful.
That's theory; my position is based on Mailman practice.  Don't believe
me, ask Barry.  I also spend most of my OSS time on the
internationalization of XEmacs, and the experience is similar there.
Best to convert everything as early as possible, or admit that you
don't know how.

Emacs is different than email. Either you can read a file to edit it,or you can't.The Postel principle for email says to try to do the best you can, foras much as you can.

> So for headers, which are supposed to be ASCII, or encoded via RFC rules> to ASCII (no 8-bit chars), then the discovery of an 8-bit char should be> produce a defect report, but then simply converted to Unicode as if it> were Latin-1 (since there is no other knowledge available that could> produce a better conversion).
No, that is already corruption.  Most clients will assume that string
is valid as a header, because it's valid as a string.

Sure it is corruption. That's why there is a defect report. But theconversion technique is appropriate, per the Postel principle.

 > And if the result of that is not expected by the client (your
 > definition), then the client should either notice the defect report
 > and reject it based on that, or attempt to parse it, and reject it
 > if it encounters unexpected syntax.  After all, this is, for that
 > client, "raw user input" (albeit from a remote source) so fully
 > error checking the input is appropriate.

No way.  That environment would suck to program in.  And it's
un-Pythonic: "Errors should never pass silently."

Then the Postel principle is un-Pythonic, and to be Pythonic anyincorrect email should produce an error, and be unreadable. Again, Imentioned producing a defect report. That is not passing an error silently.

It is still raw user input, and should still be checked for propersyntax by the client, even if the email is well-formed and conversionproduces no defect report. If you don't want to check proper syntax inyour program inputs, I don't want to use your programs, they will beinsecure.

> Python way. Since the email library is trying to avoid raising> exceptions in large blocks of its code, it is non-Pythonic


I disagree with that.  "Unless explicitly silenced."  The strategy
that Barry and I favor is to signal errors lazily.  So we *explicitly*
silence errors (at least of the Exception kind) when parsing.  If we
can't parse, we look for a part terminator, encapsulate the bad stuff
and move on to the rest of the input.  Later, at use time, *if* the
unparsable object is used, *then* the error will be raised, hopefully
with enough metainformation to figure out what to do about it.


So there seem to be two techniques:

1) convert quickly, but don't raise errors... instead metainformationstructures that record the errors, and raise them later if the converteddata is accessed. Because some kinds of not-quite-perfect data havealternate handling techniques, either all techniques must be performedand cached, or *some processing must be deferred until the client candecide*.

2) Store the data, and convert only if the data is accessed. Whenclient accesses the data, the exceptions raised allow the client tochoose an appropriate processing technique for handling thenot-quite-perfect data, based on the context of the client, theimportance of that data item, etc. Only the result of that techniqueneed be cached for future accesses.

With both techniques, the data is given to the email library, and theerrors are not seen until later... potentially the exact same userexperience. But with the technique 1, much effort is expended toconvert data, parse data, and create error metainformation ready toreturn IF the data is accessed. (yeah, don't say it, prematureoptmization -- I call it design, in this case) With technique 2, littleeffort is required to store the data, create a state variable toindicate whether it has been converted and parsed, or not, and then IF(and only IF) the data is accessed, the conversion and parsing must bedone on the first access, and instead of creating and storingmetainformation about the errors, they could just be raised.

I don't see what's un-Pythonic about that.

The un-Pythonic thing is returning defect reports instead of raisingerrors. There is no way for a simple assignment interface to return anerror, because the API for simple assignment doesn't have an in-bandsignaling mechanism. No "condition code" left around to be checked.And programmers often omit checking condition codes anyway, due tolaziness and hubris "nothing will go wrong with THIS statement". So thePythonic way, AFAIU, is that errors are returned out-of-band via raisedexceptions.

Perhaps this is why it is so hard to design a Pythonic interface to thePostel principle email handling... an out-of-band signalling systeminterrupts the flow of control, and the Postel principle wants toprovide best-as-you-can data... and the easiest way to do Postel is tosupply the not-quite-perfect data so the normal control flow can handlethings, yet an out-of-band signal can't easily return to the normalcontrol flow, and wrapping tiny try blocks around nearly every email APIcall is as annoying to the understanding of the control flow as puttingall those if statements in the normal control flow to check "conditioncodes" (error codes, warning codes, defect reports, whatever you want tocall them).

Stated another way, it is hard to process potentially not-quite-perfectdata without writing complex code. And because the email library wantsto simplify the handling of email, it wants to limit the complexity ofthe client code. But when dealing with not-quite-perfect data, there isa choice of different ways to handle it, and the email library doesn'tknow the best choice for any particular client application... if it did,then it could make the choices, and the client could be less complex.

The simplest client could be handed only perfectly structured, 100%accurately decodable email messages... its logic would be (simply, andPythonically):


while 1:
   try:
       getEmail()

except:logBadEmailReceived

   else:
      processEmail()

In order to allow defect reports to be useful, the client logic must bemore complex; getEmail must be expanded to make decisions based on thecontent of the defect reports. More try statements must be used, at afiner granularity, or more if statements to check defect reports. Theformer is more Pythonic, the latter less, AFAIU.

Perhaps a given client knows how it wants to handle all types ofnot-quite-perfect data -- should the email library allow rules to beset, so that when a situation arises, it can handle it according to therules? This simplifies the client logic, at the cost of initializationsetup, rules creation and caching, documenting the rules, adding the newAPIs that don't seem to exist in today's email library. While thiscould perhaps simplify many clients, it cannot simplify the emaillibrary... it still has to have the code for all the variant perfect andnot-quite-perfect data handling techniques, plus the complexity of ruledefinition and usage.


--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

_______________________________________________
Email-SIG mailing list
Email-SIG@python.org
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

Re: [Email-SIG] fixing the current email module

Reply via email to