Re: [Web-SIG] Proposed WSGI extensions for asynchronous servers
On May 12, 2008, at 12:45 AM, Ionel Maries Cristian wrote: On Mon, May 12, 2008 at 3:25 AM, Christopher Stawarz [EMAIL PROTECTED] wrote: On May 11, 2008, at 7:05 PM, Phillip J. Eby wrote: For this to work, you're going to need this to take the wsgi.input object as a parameter. If you don't, then this will bypass middleware that replaces wsgi.input. That is, you will need a way for this spec to support middleware that's replacing wsgi.input, without the middleware knowing that this specification exists. In the worst case, it should detect the replaced input and give an error or some response that lets the application know it won't really be able to use the async feature. I hadn't considered middleware that replaces wsgi.input. Is there an example component you can point me to, just so I have something concrete to look at? Given that the semantics of wsgi.input are, in general, incompatible with non-blocking execution, I'm inclined to think that such middleware would either need to be rewritten to use x- wsgiorg.async.input, or just couldn't be used with asynchronous servers. But I'll think about it some more -- maybe there's a way to make this work. Making input filters work could be achieved using greenlets - but then again - if one would use greenlets he could use them to simulate a seemingly blocking api for the input so this is pretty much pointless. But I agree, detecting this is good and errors should be thrown in this case. In cogen i'm setting wsgi.input to None - so any use of it would end in a error - though it's not very elegant. But if your server sets wsgi.input to None, then you really can't claim that it's WSGI-compliant. It seems like the authors of asynchronous servers have two options for how to handle wsgi.input. The first option is to provide a compliant wsgi.input (with file-like, blocking behavior). This means that middleware that uses/replaces wsgi.input will work properly, but the whole server can block whenever such use takes place. Therefore, apps and middleware will essentially be required to use x- wsgiorg.async.input. The second option is to provide a non-compliant (i.e. non-blocking) wsgi.input, which works something like x-wsgiorg.async.input. But then any middleware that uses wsgi.input will be broken, since it won't work as expected. In either case, wsgi.input ends up being unusable. Ugh. Of course, there is an easy way out of this: Drop the idea of x- wsgiorg.async.input, and push the responsibility for making wsgi.input non-blocking on to server authors. In effect, this would mean that asynchronous servers must *always* pre-read the request body and provide it to the app as a StringIO (or whatever). I would like to avoid this requirement, since the ability for servers to provide on-demand, non-blocking input to the application seems useful. But if it comes down to a choice between (1) the ability to receive data from the client on-demand and (2) having a wsgi.input that can actually be used, I'm think I'd choose (2). Chris ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Proposed WSGI extensions for asynchronous servers
On May 12, 2008, at 12:18 PM, James Y Knight wrote: There are other issues. How do you do a DNS lookup? How do you get process completion notification? Heck, how do you run a process? These are valid questions that I'm not attempting to address with this proposal. So maybe the title of my spec should be Extensions for Asynchronous I/O, since that's the only issue it deals with. I see these other issues as something for other specifications to address. Once you have I/O readiness information, what do you do with that? I guess you'd need to write a whole new asynchronous server framework on top of AWSGI? I can't see being able to use it raw for any real applications. No, you don't need a whole new framework. You need libraries (for making HTTP requests, talking to databases, etc.) that are written to use the extensions the spec provides. These only need to be written once and can then be used with *any* server that supports the extensions. So the existence of a spec like this lets us move from a world where every server/framework (be it Twisted, nginx, cogen, whatever) needs to reimplement these utilities in terms of its own async I/O framework, to one where a single implementation can be written against the spec and then used by any server that implements it. In turn, this should make application developers more comfortable with targeting their apps at async servers, since they won't be tied to any particular server/framework's API. And, yes, the fact that what I just wrote sounds like write once, run anywhere sets off alarm bells in my head, too :) But I think the interface I propose is so basic that any async server should be able to provide it with very little trouble. What if the event-loop of the server doesn't use integer fds, but windows file handles or a java channel object? Where are you allowed to get these integers from? Is it always a socket from socket.socket().fileno()? Or can it be a file from open().fileno() or os.open()? A pipe from os.pipe()? Note that these distinctions are important everywhere but UNIX. Although I didn't state it in the spec, my thinking was that readable/ writable should accept whatever would be accepted by select() on the platform you're running on. On Windows, they would be limited to sockets; elsewhere, any file descriptor would do. In that light, maybe the title should really be Extensions for Polling File Descriptors for I/O Readiness. But even limited to that scope, I still think it'd be extremely useful. * To prevent an application that does blocking I/O from blocking the entire server, an asynchronous server could run each instance of the application in a separate thread. However, since asynchronous servers achieve high levels of concurrency by expressly *avoiding* multithreading, this technique will almost always be unacceptable. Well, my claim would be that it's usually acceptable. Certainly sometimes it's not, which is where the use of an asynchronous server framework comes in handy. I don't get how it's acceptable. If you spawn a separate thread for each request, then your server is no longer asynchronous. At that point, why not just save yourself some trouble and use Apache? PS, a minor bug: I notice the spec says wsgiorg.async.input is supposed to have only a read function, but you actually call recv() on it in the examples. Thanks. The examples in the spec text are correct, but I haven't updated the examples in my reference code yet. Chris ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Proposed WSGI extensions for asynchronous servers
On May 12, 2008, at 5:07 PM, James Y Knight wrote: Surely you need DNS lookup to make a socket connection? Do you mean to provide that in an external library via a threadpool? No, I don't mean to, because I don't care enough to bother. But if you or someone else did, you'd be free to. You do need a framework. Using socket functions correctly (and portably) in non-blocking mode is not trivial. I need a library, not a framework. And I may not even need to write it myself. (For example, for making HTTP requests, I can use pycurl.) 1) Using apache is certainly a valid option performance-wise. Apache is pretty fast (obviously not the fastest server ever, but pretty good...). So if it has the features/packaging you need, by all means, use it. The advantage IMO of python servers is that they're lighter-weight deployment-wise and more easily configurable by code. Fair enough. But I'm specifically interested in doing non-blocking I/ O on an asynchronous server. 2) If your app uses a database, you probably might as well just run it in a thread, because you're most likely going to use a blocking database API anyhow. Yes, the compatibility of database and other API's with an asynchronous execution model is important. Some (like MySQL) don't support non-blocking connections, so you'd have to work around that with threads or some other technique. Others (like PostgreSQL) do provide an async API, which could be used with my proposed extensions. (Manlio Perillo has an example of how this works with his nginx mod_wsgi module at http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-postgres-async.py.) This is another issue you have to worry about to keep your app non- blocking, but I don't think it's an insurmountable one. And again, any library you develop to support these operations, written in terms of the proposed non-blocking I/O extensions, will be usable on any server that supports the extensions. 3) If your app does not make use of outgoing sockets, then 3a) If it also doesn't use wsgi.input, you could inform the WSGI server that it can just run the app not in a thread as it won't be blocking. 3b) If it does use wsgi.input, but doesn't need to read it incrementally, you could inform the server that it should pre-read the input and then run the app directly, not in a thread, as it won't be blocking. If none of the above apply, that is: you do not use a database, you do use incremental reading of wsgi.input, or an outgoing socket connection, /then/ an async WSGI extension might be useful. I claim that will cover a small subset of WSGI apps. As I mentioned above, the database issue is a real one, but it can be dealt with. I would like to be able to allow incremental reading of wsgi.input, but I don't see how to do this without breaking middleware. (If you have suggestions, please let me know.) As for outgoing socket connections, I'm willing to accept the cost of a DNS lookup; if someone else isn't, then they're free to write some kind of local lookup server that their app talks to over a socket, and other applications running on other servers can enjoy the fruits of their labor. I regret calling my proposal Extensions for Asynchronous Servers, since clearly that encompasses a much broader range of functionality for you than it does for me. All I'm interested in is the ability to poll file descriptors (and the things that allows me to do), and in the next revision of my proposal I'll strive to make that clear. If you have an application that requires functionality beyond that, then my proposal won't be sufficient for your needs. Chris ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Proposed WSGI extensions for asynchronous servers
This is a revised version of my AWSGI proposal from last week. While many of the details remain the same, the big change is that I'm now proposing a set of extensions to standard WSGI, rather than a separate specification for asynchronous servers. The updated proposal is included below. I've also posted it at http://wsgi.org/wsgi/Specifications/async The bzr repository for my reference implementation (which is only partially updated to match the new spec) is now at http://pseudogreen.org/bzr/wsgiorg_async_ref/ I'd appreciate your comments. Thanks, Chris Abstract This specification defines a set of extensions that allow WSGI applications to run effectively on asynchronous (aka event driven) servers. Rationale - The architecture of an asynchronous server requires all I/O operations, including both interprocess and network communication, to be non-blocking. For a WSGI-compliant server, this requirement extends to all applications run on the server. However, the WSGI specification does not provide sufficient facilities for an application to ensure that its I/O is non-blocking. Specifically, there are two issues: * The methods provided by the input stream (``environ['wsgi.input']``) follow the semantics of the corresponding methods of the ``file`` class. In particular, each of these methods can invoke the underlying I/O function (in this case, ``recv`` on the socket connected to the client) more than once, without giving the application the opportunity to check whether each invocation will block. * WSGI does not provide the application with a mechanism to test arbitrary file descriptors (such as those belonging to sockets or pipes opened by the application) for I/O readiness. This specification defines a standard interface by which asynchronous servers can provide the required facilities to applications. Specification - Servers that want to allow applications to perform non-blocking I/O must add four new variables to the WSGI environment: ``x-wsgiorg.async.input``, ``x-wsgiorg.async.readable``, ``x-wsgiorg.async.writable``, and ``x-wsgiorg.async.timeout``. The following sections describe these extensions. Non-blocking Input Stream ~ The ``x-wsgiorg.async.input`` variable provides a non-blocking replacement for ``wsgi.input``. It is an object with one method, ``read(size)``, that behaves like the ``recv`` method of ``socket.socket``. This means that a call to ``read`` will invoke the underlying socket ``recv`` **no more than once** and return **at most** ``size`` bytes of data (possibly less). In addition, ``read`` may return an empty string (zero bytes) **only** if the client closes the connection or the application attempts to read more data than is specified by the ``CONTENT_LENGTH`` variable. Before each call to ``read``, the application **must** test the input stream for readiness with ``x-wsgiorg.async.readable`` (see below). The result of calling ``read`` on a non-ready input stream is undefined. As with ``wsgi.input``, the server is free to implement ``x-wsgiorg.async.input`` using any technique it chooses (performing reads on demand, pre-reading the request body, etc.). The only requirements are for ``read`` to obey the expected semantics and the input object to be accepted as the first argument to ``x-wsgiorg.async.readable``. Testing File Descriptors for I/O Readiness ~~ The variables ``x-wsgiorg.async.readable`` and ``x-wsgiorg.async.writable`` are callable objects that accept two positional arguments, one required and one optional. In the following description, these arguments are given the names ``fd`` and ``timeout``, but they are not required to have these names, and the application **must** invoke the callables using positional arguments. The first argument, ``fd``, is either an integer representing a file descriptor or an object with a ``fileno`` method that returns such an integer. (In addition, ``fd`` may be ``x-wsgiorg.async.input``, even if it lacks a ``fileno`` method.) The second, optional argument, ``timeout``, is either ``None`` or a floating-point value in seconds. If omitted, it defaults to ``None``. When called, ``readable`` and ``writable`` return the empty string (``''``), which **must** be yielded by the application iterable to the server (passing through any middleware). The server then suspends execution of the application until one of the following conditions is met: * The specified file descriptor is ready for reading or writing. * ``timeout`` seconds have elapsed without the file descriptor becoming ready for I/O. * The server detects an error or exceptional condition (such as out-of-band data) on the file descriptor. Put another way, if the application calls ``readable`` and yields the empty string, it will be suspended until ``select.select([fd],[],[fd],timeout)`` would return. If the application calls
Re: [Web-SIG] Proposed WSGI extensions for asynchronous servers
On May 11, 2008, at 7:05 PM, Phillip J. Eby wrote: At 06:15 PM 5/11/2008 -0400, Christopher Stawarz wrote: Non-blocking Input Stream ~ The ``x-wsgiorg.async.input`` variable provides a non-blocking replacement for ``wsgi.input``. It is an object with one method, ``read(size)``, that behaves like the ``recv`` method of ``socket.socket``. This means that a call to ``read`` will invoke the underlying socket ``recv`` **no more than once** and return **at most** ``size`` bytes of data (possibly less). In addition, ``read`` may return an empty string (zero bytes) **only** if the client closes the connection or the application attempts to read more data than is specified by the ``CONTENT_LENGTH`` variable. Before each call to ``read``, the application **must** test the input stream for readiness with ``x-wsgiorg.async.readable`` (see below). The result of calling ``read`` on a non-ready input stream is undefined. For this to work, you're going to need this to take the wsgi.input object as a parameter. If you don't, then this will bypass middleware that replaces wsgi.input. That is, you will need a way for this spec to support middleware that's replacing wsgi.input, without the middleware knowing that this specification exists. In the worst case, it should detect the replaced input and give an error or some response that lets the application know it won't really be able to use the async feature. I hadn't considered middleware that replaces wsgi.input. Is there an example component you can point me to, just so I have something concrete to look at? Given that the semantics of wsgi.input are, in general, incompatible with non-blocking execution, I'm inclined to think that such middleware would either need to be rewritten to use x- wsgiorg.async.input, or just couldn't be used with asynchronous servers. But I'll think about it some more -- maybe there's a way to make this work. If ``timeout`` seconds elapse without the file descriptor becoming ready for I/O, the variable ``x-wsgiorg.async.timeout`` will be true when the application resumes. Otherwise, it will be false. The value of ``x-wsgiorg.async.timeout`` when the application is first started or after it yields each response-body string is undefined. Er, I think you are confused here. There is no way for the server to know what environ dictionary the application is using, unless you explicitly pass it into your extension API. My thinking is that the server *creates* the environ dictionary, so it can just keep a reference to it and update it as needed. Is middleware allowed to replace environ with another dict instance before passing it to the application? I wasn't aware that this was allowed, but if it is, then I see the problem. The solution would probably be for the application to pass a mutable object (e.g. an empty list) to readable/writable that the server could set a timeout flag on. Thanks, Chris ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Proposed WSGI extensions for asynchronous servers
My thinking is that the server *creates* the environ dictionary, so it can just keep a reference to it and update it as needed. Is middleware allowed to replace environ with another dict instance before passing it to the application? I wasn't aware that this was allowed, but if it is, then I see the problem. The solution would probably be for the application to pass a mutable object (e.g. an empty list) to readable/writable that the server could set a timeout flag on. How about a environ['x-wsgiorg.async'].timeout ? I do something like that in cogen. -- http://ionelmc.wordpress.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com