On approximately 10/7/2009 3:33 AM, came the following characters from the keyboard of Stephen J. Turnbull:
Glenn Linderman writes:

 > > If you mean that the email module will keep track of what form the
 > > object is currently represented by, that will eventually result in
 > > "UnicodeError: octet out of range: 161, ascii".
> > The above sentence does not communicate your meaning to me... or any > meaning, actually. Can you explain?

Yes, that Unicode error is one that took years for Mailman to work
around.  If we are going to be converting different objects at
different times, I'm sure we'll get to see it agin in the future.  Oh,
joy.

Ah, a historical remark! So that's why it was lost on me, I'm new to the Python world (but programming since 1975...)


> If conversions are avoided, then octets are unlikely to be out of > range?

Haven't looked in your spam bucket recently, I guess.  Spammers
regularly put 8 bit characters into headers (and into bodies in
messages without a Content-Type header), for one thing.

I'm aware of that, but if conversions are not done, octets are unlikely to be _reported_ to be out of range....


> And the email module must be aware of the form of the data in > order to manipulate it in any format other than wire format, but > fortunately, wire format declares the format of the data (not to say > there is not buggy wire format data -- but that is an issue best avoided > by avoiding as many conversions as possible).

"Best" I can't speak to; you obviously are willing to accept a much
higher error rate than I am.  "Robust" handling of buggy wire format
data means that the email module must do something sane with it before
giving it to the application.  Maybe it's reasonable to do that
lazily, and/or cache the result, but access to bogus data (that the
email module can determine is bogus or suspicious) must not be allowed
unless the client says "hit me with your best shot" explicitly.  Most
clients are simply not going to be prepared for the kind of crap I see
in /var/mail/turnbull every day.

Are you referring to most email clients, or most Python-email-library-using clients? It seems like most email clients are being hit with the same stuff you are seeing... every day... and are handling it somehow... although anti-spam filters do eliminate some of it before the end user's MUA sees it, depending on the ISP, etc.

Is it your point of view, then, that incorrectly formed email should be mostly treated as SPAM? Your paragraph above could be interpreted that way. Oleg's point is also valid though, so it seems that isn't your point of view.

Your "hit me with your best shot" comment indicates that you want a failure code or exception when the data is bad, and then a way to "retry accepting errors"?


 > I was pushing back from your declaration that an archiver would
 > always want string output

Please don't push back; we won't get anywhere.  Use cases are
*examples*, not complete specifications of all possible inputs and
outputs.  Use cases should be simple and clear cut.  If you want a
different use case, state it.  In fact in the real world, *all* of the
archivers I know of produce text formats on disk, either deleting
multimedia objects or saving them off and linking to them via URLs in
the text.  If you know of a different kind of archiver, add it as a
use case.

I misunderstood the purpose of your list. Sure, everything in your list is a good example of real world uses.

--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

_______________________________________________
Email-SIG mailing list
Email-SIG@python.org
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

Reply via email to