Glenn Linderman writes: > Emacs is different than email. Either you can read a file to edit it, > or you can't.
*sigh* Emacs is as powerful a programming environment as Python, and applications regularly deal with network streams (HTTP, NNTP, and SMTP most commonly, but also raw X protocol and any kind of socket supported by the platform). So, yes, it's different from email, because it's *far* more general. That's precisely why I appreciate Bill's concerns about non-email usage. > The Postel principle for email says to try to do the best you can, > for as much as you can. Actually, it doesn't. It says be lenient in what you accept, strict in what you emit. You accept it ... but you don't have to do anything with it except preserve it verbatim for whoever wants it. > > > produce a defect report, but then simply converted to Unicode as if it > > > were Latin-1 (since there is no other knowledge available that could > > > produce a better conversion). > > > > No, that is already corruption. Most clients will assume that string > > is valid as a header, because it's valid as a string. > > Sure it is corruption. That's why there is a defect report. But > the conversion technique is appropriate, per the Postel principle. Actually, I would say you are emitting leniently, in violation of the Postel principle. You don't know what the client will do, they may eat it in a single gulp without looking at it. Thus you should avoid converting anything that you don't know what it is (unless specifically asked to do your best). > Again, I mentioned producing a defect report. That is not passing > an error silently. But if I access that Unicode object without looking at the defect report, you *will* pass the error silently. OTOH, if I look at the defect report, I won't access the Unicode object. > It is still raw user input, and should still be checked for proper > syntax by the client, Nonsense. The email module had better know a lot more about syntax than the client. If it doesn't, whack it with a 2x4 until it learns! > produces no defect report. If you don't want to check proper syntax in > your program inputs, I don't want to use your programs, they will be > insecure. So you're saying that every program that uses the email module should reproduce 100% of the functionality of the email module's parser, or it's insecure. And you imply that's an excuse for passing corrupt data to any client that asks for it. I disagree. > So there seem to be two techniques: Whatever gave you that idea? > 2) Store the data, and convert only if the data is accessed. > With technique 2, little effort is required to store the data, > create a state variable to indicate whether it has been converted Why do that? It's always "False" in technique 2. > and parsed, or not, and then IF (and only IF) the data is accessed, > the conversion and parsing must be done on the first access, and > instead of creating and storing metainformation about the errors, > they could just be raised. No, they cannot just be raised. If you just raise the error, then the next time you try to access unparsed data, you'll hit the error again. If you use the same handler you did before, you're in an infloop. So you need a second handler to do things differently this time or a flag ... but it's unclear to me that that flag can be a boolean. So you may as well store the defect list and information about where to restart. > So the Pythonic way, AFAIU, is that errors are returned out-of-band > via raised exceptions. Sure. But what you're missing is that "Neither rain, nor snow, nor dark of night may stop the Parser on her appointed rounds." It is not easy to write parsers, but I'll tell you one thing: it's orders of magnitude harder to write a parser that starts in the middle and works outward, than one that starts at the beginning and works forward to the end. So it's OK to write a lazy parser, but it must retain enough state so that it can work forward until the end. Because you don't know that the client will not request the last character of the message, you need to be able to try to get it, no matter what happened to the first 10GB of the message. And if an exception occurs, it must be handled by the parser itself; if not, you put the poor thing in the position of starting over at the beginning (that way lies the madness of infloops), or trying to start a parse in the middle and work out. _______________________________________________ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com