I'm running out of time to work on this (yeah, I know it's the weekend, but my life is like that lately). I think we're converging, though, so I'd like try and tie some of those ends together.
Glenn Linderman writes: > On approximately 10/9/2009 8:10 AM, came the following characters from > the keyboard of Stephen J. Turnbull: > > Actually, I would say you are emitting leniently, in violation of the > > Postel principle. > > You can say that, but I don't have to believe it. I'm talking about > accepting; the message has arrived, it is here, the client is trying to > look at it, and I'm talking about ways the client can look at > not-quite-perfect data, knowing that it is not quite perfect, but still > being able to see it. I'm not at all talking about emitting data. It would be indeed, if the corrupt data is stored in the place where correctly decoded data normally is stored, and is accessible in the same way. But I gather that's not what you were talking about, my mistake. > You seem to be calling the email package helping the client to > accept not-quite-perfect data, as a form of emitting data. It is > not. No, I was confused by the way you wrote. Saving the data *somewhere* is absolutely necessary; not losing data is the #1 commandment of low-level mail processing. Surely the email module is subject to that commandment. *Nobody* is talking about losing any data yet, except Barry indirectly when he says that some people think giving up on invertibility (often called "idempotency"), and even he is quite adamant that he's not going to give up on that. So when you wrote about saving and converting to text form, without mentioning that the specific APIs, I assumed you meant the "mainline" APIs for parsing and accessing parts of a correctly formatted message. > The email package cannot police the client... if it chooses to "eat it > in a single gulp without looking at it" then it may get indigestion. I > never suggested that "converting to Unicode as if it were Latin-1" > should be done without informing the client, or being requested by the > client to do that via a special API call... Well, maybe I misread it, but it certainly looked like that to me. I would not object to that special API call defaulting to ISO 8859/1. > If you ignore defect reports, you are ignorant (blunt, but not intended > to be offensive). What I worried about is that if defect reports are present, *but displayable data is also present*, programmers *will* simply display it, for example in producing a prototype program. It will be impossible to determine without very close analysis of that program that an early version became a production version without adding appropriate checks. In practice, this bug will be discovered when some end user's installation breaks. It seems that you agree with this, and because the special API call is necessary, it will be easy to identify whether proper care is being taken or not. Right? > > > It is still raw user input, and should still be checked for proper > > > syntax by the client, > > > > Nonsense. The email module had better know a lot more about syntax > > than the client. If it doesn't, whack it with a 2x4 until it learns! > > I think we are talking at cross purposes here. I find it quite > difficult to follow where you cross the boundary between talking about > one sort of email package client, and then switch to another type, or > switch to the responsibilities of the email package. Excuse me? The "raw user input" you referred to above is material that the client software receives from the email package. The email package should give it to the client in the "normal" (convenient) way only if it can certify that it conforms to the appropriate standard. That standard should be specified in the API documentation. Any more detailed structure, of course, is the responsibility of the client. > An application which is using email as a transport, has specific goals, > which require specific content. You were mentioning clients. I've already said that when I speak of an MUA, I write "MUA". In speaking of the calling program, which might even be a user running the module via the Python interpreter, I write "client". It's a very convenient way to describe the user of an API, in contrast to the provider of the API (the implementation). > If such a client doesn't validate the syntax of that content, it > isn't much of an application. If that MUA or email application uses RFC 822 addresses, it should be able to rely on the email module to parse those addresses correctly, or provide a defect report. One might even go so far as to suggest that it be able to parse the (non-RFC, but very common) "+" notation for separating the "mailbox" from "additional data" used for VERP and challenge-response applications. That would have to be documented, but if so documented client applications like the MUA should be able to rely on it (and you can bet many will). Application domain syntax of course is not the email module's problem whether it arrives by email or Pony Express, and I'm really confused why you're going so far afield. > > No, they cannot just be raised. If you just raise the error, then the > > next time you try to access unparsed data, you'll hit the error > > again. If you use the same handler you did before, you're in an > > infloop. So you need a second handler to do things differently this > > time or a flag ... but it's unclear to me that that flag can be a > > boolean. So you may as well store the defect list and information > > about where to restart. > > From the point of view of the email package, the errors can just be > raised. Then the client can make choices, and use other APIs or other > parameters to the API to direct the email package to attempt a different > technique to access the data. The problem is that by this point some of the state of the parse may be lost. We can't say "just raise", we need to say "interrupt the parse, preserve state, and then raise". Python does absolutely nothing to help with the problem of preserving the state. We also need to determine just what state to preserve. > Yes, I have learned that in my 34 years of programming. I agree. > > > So it's OK to write a lazy parser, but it must retain enough state so > > that it can work forward until the end. [...] > > Are you speaking about parsing the message into MIME parts, or parsing a > particular MIME part contained within the message, or both? Both. I *believe* (but it needs to be checked) that in a correctly formed multipart MIME object (message or part), any internal structure is context-free within the MIME boundaries. If that is so, then individual parts of the object can be stored in raw form and parsed lazily. Similarly, for any MIME or RFC 822 object, the object can be parsed into header section and body section, and each can be stored and parsed lazily, subject to the condition that the header section must be sufficiently parsed to identify all headers that might affect parsing the body part before the body part is parsed. That "condition" is the context. _______________________________________________ Email-SIG mailing list [email protected] Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
