Re: [Web-SIG] Proposed WSGI extensions for asynchronous servers

2008-05-12 Thread Christopher Stawarz

On May 12, 2008, at 12:45 AM, Ionel Maries Cristian wrote:

On Mon, May 12, 2008 at 3:25 AM, Christopher Stawarz [EMAIL PROTECTED] 
 wrote:

On May 11, 2008, at 7:05 PM, Phillip J. Eby wrote:

For this to work, you're going to need this to take the wsgi.input  
object as a parameter.  If you don't, then this will bypass  
middleware that replaces wsgi.input.


That is, you will need a way for this spec to support middleware  
that's replacing wsgi.input, without the middleware knowing that  
this specification exists.  In the worst case, it should detect the  
replaced input and give an error or some response that lets the  
application know it won't really be able to use the async feature.


I hadn't considered middleware that replaces wsgi.input.  Is there  
an example component you can point me to, just so I have something  
concrete to look at?


Given that the semantics of wsgi.input are, in general, incompatible  
with non-blocking execution, I'm inclined to think that such  
middleware would either need to be rewritten to use x- 
wsgiorg.async.input, or just couldn't be used with asynchronous  
servers.  But I'll think about it some more -- maybe there's a way  
to make this work.



Making input filters work could be achieved using greenlets - but  
then again - if one would use greenlets he could use them to  
simulate a seemingly blocking api for the input so this is pretty  
much pointless.


But I agree, detecting this is good and errors should be thrown in  
this case.
In cogen i'm setting wsgi.input to None - so any use of it would end  
in a error - though it's not very elegant.


But if your server sets wsgi.input to None, then you really can't  
claim that it's WSGI-compliant.


It seems like the authors of asynchronous servers have two options for  
how to handle wsgi.input.  The first option is to provide a compliant  
wsgi.input (with file-like, blocking behavior).  This means that  
middleware that uses/replaces wsgi.input will work properly, but the  
whole server can block whenever such use takes place.  Therefore, apps  
and middleware will essentially be required to use x- 
wsgiorg.async.input.


The second option is to provide a non-compliant (i.e. non-blocking)  
wsgi.input, which works something like x-wsgiorg.async.input.  But  
then any middleware that uses wsgi.input will be broken, since it  
won't work as expected.


In either case, wsgi.input ends up being unusable.  Ugh.

Of course, there is an easy way out of this:  Drop the idea of x- 
wsgiorg.async.input, and push the responsibility for making wsgi.input  
non-blocking on to server authors.  In effect, this would mean that  
asynchronous servers must *always* pre-read the request body and  
provide it to the app as a StringIO (or whatever).


I would like to avoid this requirement, since the ability for servers  
to provide on-demand, non-blocking input to the application seems  
useful.  But if it comes down to a choice between (1) the ability to  
receive data from the client on-demand and (2) having a wsgi.input  
that can actually be used, I'm think I'd choose (2).



Chris
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Proposed WSGI extensions for asynchronous servers

2008-05-12 Thread Christopher Stawarz

On May 12, 2008, at 12:18 PM, James Y Knight wrote:

There are other issues. How do you do a DNS lookup? How do you get  
process completion notification? Heck, how do you run a process?


These are valid questions that I'm not attempting to address with this  
proposal.  So maybe the title of my spec should be Extensions for  
Asynchronous I/O, since that's the only issue it deals with.  I see  
these other issues as something for other specifications to address.


Once you have I/O readiness information, what do you do with that? I  
guess you'd need to write a whole new asynchronous server framework  
on top of AWSGI? I can't see being able to use it raw for any real  
applications.


No, you don't need a whole new framework.  You need libraries (for  
making HTTP requests, talking to databases, etc.) that are written to  
use the extensions the spec provides.  These only need to be written  
once and can then be used with *any* server that supports the  
extensions.


So the existence of a spec like this lets us move from a world where  
every server/framework (be it Twisted, nginx, cogen, whatever) needs  
to reimplement these utilities in terms of its own async I/O  
framework, to one where a single implementation can be written against  
the spec and then used by any server that implements it.  In turn,  
this should make application developers more comfortable with  
targeting their apps at async servers, since they won't be tied to any  
particular server/framework's API.


And, yes, the fact that what I just wrote sounds like write once, run  
anywhere sets off alarm bells in my head, too :)  But I think the  
interface I propose is so basic that any async server should be able  
to provide it with very little trouble.


What if the event-loop of the server doesn't use integer fds, but  
windows file handles or a java channel object? Where are you allowed  
to get these integers from? Is it always a socket from  
socket.socket().fileno()? Or can it be a file from open().fileno()  
or os.open()? A pipe from os.pipe()? Note that these distinctions  
are important everywhere but UNIX.


Although I didn't state it in the spec, my thinking was that readable/ 
writable should accept whatever would be accepted by select() on the  
platform you're running on.  On Windows, they would be limited to  
sockets; elsewhere, any file descriptor would do.


In that light, maybe the title should really be Extensions for  
Polling File Descriptors for I/O Readiness.  But even limited to that  
scope, I still think it'd be extremely useful.



* To prevent an application that does blocking I/O from blocking the
entire server, an asynchronous server could run each instance of the
application in a separate thread.  However, since asynchronous
servers achieve high levels of concurrency by expressly *avoiding*
multithreading, this technique will almost always be unacceptable.


Well, my claim would be that it's usually acceptable. Certainly  
sometimes it's not, which is where the use of an asynchronous server  
framework comes in handy.


I don't get how it's acceptable.  If you spawn a separate thread for  
each request, then your server is no longer asynchronous.  At that  
point, why not just save yourself some trouble and use Apache?


PS, a minor bug: I notice the spec says wsgiorg.async.input is  
supposed to have only a read function, but you actually call recv()  
on it in the examples.


Thanks.  The examples in the spec text are correct, but I haven't  
updated the examples in my reference code yet.



Chris
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Proposed WSGI extensions for asynchronous servers

2008-05-12 Thread Christopher Stawarz

On May 12, 2008, at 5:07 PM, James Y Knight wrote:

Surely you need DNS lookup to make a socket connection? Do you mean  
to provide that in an external library via a threadpool?


No, I don't mean to, because I don't care enough to bother.  But if  
you or someone else did, you'd be free to.


You do need a framework. Using socket functions correctly (and  
portably) in non-blocking mode is not trivial.


I need a library, not a framework.  And I may not even need to write  
it myself.  (For example, for making HTTP requests, I can use pycurl.)


1) Using apache is certainly a valid option performance-wise. Apache  
is pretty fast (obviously not the fastest server ever, but pretty  
good...). So if it has the features/packaging you need, by all  
means, use it. The advantage IMO of python servers is that they're  
lighter-weight deployment-wise and more easily configurable by code.


Fair enough.  But I'm specifically interested in doing non-blocking I/ 
O on an asynchronous server.


2) If your app uses a database, you probably might as well just run  
it in a thread, because you're most likely going to use a blocking  
database API anyhow.


Yes, the compatibility of database and other API's with an  
asynchronous execution model is important.  Some (like MySQL) don't  
support non-blocking connections, so you'd have to work around that  
with threads or some other technique.  Others (like PostgreSQL) do  
provide an async API, which could be used with my proposed  
extensions.  (Manlio Perillo has an example of how this works with his  
nginx mod_wsgi module at http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-postgres-async.py.)


This is another issue you have to worry about to keep your app non- 
blocking, but I don't think it's an insurmountable one.  And again,  
any library you develop to support these operations, written in terms  
of the proposed non-blocking I/O extensions, will be usable on any  
server that supports the extensions.



3) If your app does not make use of outgoing sockets, then
3a) If it also doesn't use wsgi.input, you could inform the WSGI  
server that it can just run the app not in a thread as it won't be  
blocking.
3b) If it does use wsgi.input, but doesn't need to read it  
incrementally, you could inform the server that it should pre-read  
the input and then run the app directly, not in a thread, as it  
won't be blocking.


If none of the above apply, that is: you do not use a database, you  
do use incremental reading of wsgi.input, or an outgoing socket  
connection, /then/ an async WSGI extension might be useful. I claim  
that will cover a small subset of WSGI apps.


As I mentioned above, the database issue is a real one, but it can be  
dealt with.  I would like to be able to allow incremental reading of  
wsgi.input, but I don't see how to do this without breaking  
middleware.  (If you have suggestions, please let me know.)  As for  
outgoing socket connections, I'm willing to accept the cost of a DNS  
lookup; if someone else isn't, then they're free to write some kind of  
local lookup server that their app talks to over a socket, and other  
applications running on other servers can enjoy the fruits of their  
labor.


I regret calling my proposal Extensions for Asynchronous Servers,  
since clearly that encompasses a much broader range of functionality  
for you than it does for me.  All I'm interested in is the ability to  
poll file descriptors (and the things that allows me to do), and in  
the next revision of my proposal I'll strive to make that clear.  If  
you have an application that requires functionality beyond that, then  
my proposal won't be sufficient for your needs.



Chris
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Proposed WSGI extensions for asynchronous servers

2008-05-11 Thread Christopher Stawarz
This is a revised version of my AWSGI proposal from last week.  While  
many of the details remain the same, the big change is that I'm now  
proposing a set of extensions to standard WSGI, rather than a separate  
specification for asynchronous servers.


The updated proposal is included below.  I've also posted it at

  http://wsgi.org/wsgi/Specifications/async

The bzr repository for my reference implementation (which is only  
partially updated to match the new spec) is now at


  http://pseudogreen.org/bzr/wsgiorg_async_ref/

I'd appreciate your comments.


Thanks,
Chris



Abstract


This specification defines a set of extensions that allow WSGI
applications to run effectively on asynchronous (aka event driven)
servers.

Rationale
-

The architecture of an asynchronous server requires all I/O
operations, including both interprocess and network communication, to
be non-blocking.  For a WSGI-compliant server, this requirement
extends to all applications run on the server.  However, the WSGI
specification does not provide sufficient facilities for an
application to ensure that its I/O is non-blocking.  Specifically,
there are two issues:

* The methods provided by the input stream (``environ['wsgi.input']``)
  follow the semantics of the corresponding methods of the ``file``
  class.  In particular, each of these methods can invoke the
  underlying I/O function (in this case, ``recv`` on the socket
  connected to the client) more than once, without giving the
  application the opportunity to check whether each invocation will
  block.

* WSGI does not provide the application with a mechanism to test
  arbitrary file descriptors (such as those belonging to sockets or
  pipes opened by the application) for I/O readiness.

This specification defines a standard interface by which asynchronous
servers can provide the required facilities to applications.

Specification
-

Servers that want to allow applications to perform non-blocking I/O
must add four new variables to the WSGI environment:
``x-wsgiorg.async.input``, ``x-wsgiorg.async.readable``,
``x-wsgiorg.async.writable``, and ``x-wsgiorg.async.timeout``.  The
following sections describe these extensions.

Non-blocking Input Stream
~

The ``x-wsgiorg.async.input`` variable provides a non-blocking
replacement for ``wsgi.input``.  It is an object with one method,
``read(size)``, that behaves like the ``recv`` method of
``socket.socket``.  This means that a call to ``read`` will invoke the
underlying socket ``recv`` **no more than once** and return **at
most** ``size`` bytes of data (possibly less).  In addition, ``read``
may return an empty string (zero bytes) **only** if the client closes
the connection or the application attempts to read more data than is
specified by the ``CONTENT_LENGTH`` variable.

Before each call to ``read``, the application **must** test the input
stream for readiness with ``x-wsgiorg.async.readable`` (see below).
The result of calling ``read`` on a non-ready input stream is
undefined.

As with ``wsgi.input``, the server is free to implement
``x-wsgiorg.async.input`` using any technique it chooses (performing
reads on demand, pre-reading the request body, etc.).  The only
requirements are for ``read`` to obey the expected semantics and the
input object to be accepted as the first argument to
``x-wsgiorg.async.readable``.

Testing File Descriptors for I/O Readiness
~~

The variables ``x-wsgiorg.async.readable`` and
``x-wsgiorg.async.writable`` are callable objects that accept two
positional arguments, one required and one optional.  In the following
description, these arguments are given the names ``fd`` and
``timeout``, but they are not required to have these names, and the
application **must** invoke the callables using positional arguments.

The first argument, ``fd``, is either an integer representing a file
descriptor or an object with a ``fileno`` method that returns such an
integer.  (In addition, ``fd`` may be ``x-wsgiorg.async.input``, even
if it lacks a ``fileno`` method.)  The second, optional argument,
``timeout``, is either ``None`` or a floating-point value in seconds.
If omitted, it defaults to ``None``.

When called, ``readable`` and ``writable`` return the empty string
(``''``), which **must** be yielded by the application iterable to the
server (passing through any middleware).  The server then suspends
execution of the application until one of the following conditions is
met:

* The specified file descriptor is ready for reading or writing.

* ``timeout`` seconds have elapsed without the file descriptor
  becoming ready for I/O.

* The server detects an error or exceptional condition (such as
  out-of-band data) on the file descriptor.

Put another way, if the application calls ``readable`` and yields the
empty string, it will be suspended until
``select.select([fd],[],[fd],timeout)`` would return.  If the
application calls 

Re: [Web-SIG] Proposed WSGI extensions for asynchronous servers

2008-05-11 Thread Christopher Stawarz

On May 11, 2008, at 7:05 PM, Phillip J. Eby wrote:


At 06:15 PM 5/11/2008 -0400, Christopher Stawarz wrote:

Non-blocking Input Stream
~

The ``x-wsgiorg.async.input`` variable provides a non-blocking
replacement for ``wsgi.input``.  It is an object with one method,
``read(size)``, that behaves like the ``recv`` method of
``socket.socket``.  This means that a call to ``read`` will invoke  
the

underlying socket ``recv`` **no more than once** and return **at
most** ``size`` bytes of data (possibly less).  In addition, ``read``
may return an empty string (zero bytes) **only** if the client closes
the connection or the application attempts to read more data than is
specified by the ``CONTENT_LENGTH`` variable.

Before each call to ``read``, the application **must** test the input
stream for readiness with ``x-wsgiorg.async.readable`` (see below).
The result of calling ``read`` on a non-ready input stream is
undefined.


For this to work, you're going to need this to take the wsgi.input  
object as a parameter.  If you don't, then this will bypass  
middleware that replaces wsgi.input.


That is, you will need a way for this spec to support middleware  
that's replacing wsgi.input, without the middleware knowing that  
this specification exists.  In the worst case, it should detect the  
replaced input and give an error or some response that lets the  
application know it won't really be able to use the async feature.


I hadn't considered middleware that replaces wsgi.input.  Is there an  
example component you can point me to, just so I have something  
concrete to look at?


Given that the semantics of wsgi.input are, in general, incompatible  
with non-blocking execution, I'm inclined to think that such  
middleware would either need to be rewritten to use x- 
wsgiorg.async.input, or just couldn't be used with asynchronous  
servers.  But I'll think about it some more -- maybe there's a way to  
make this work.



If ``timeout`` seconds elapse without the file descriptor becoming
ready for I/O, the variable ``x-wsgiorg.async.timeout`` will be true
when the application resumes.  Otherwise, it will be false.  The  
value

of ``x-wsgiorg.async.timeout`` when the application is first started
or after it yields each response-body string is undefined.


Er, I think you are confused here.  There is no way for the server  
to know what environ dictionary the application is using, unless you  
explicitly pass it into your extension API.


My thinking is that the server *creates* the environ dictionary, so it  
can just keep a reference to it and update it as needed.  Is  
middleware allowed to replace environ with another dict instance  
before passing it to the application?  I wasn't aware that this was  
allowed, but if it is, then I see the problem.


The solution would probably be for the application to pass a mutable  
object (e.g. an empty list) to readable/writable that the server could  
set a timeout flag on.



Thanks,
Chris
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Proposed WSGI extensions for asynchronous servers

2008-05-11 Thread Ionel Maries Cristian
 My thinking is that the server *creates* the environ dictionary, so it can
 just keep a reference to it and update it as needed.  Is middleware allowed
 to replace environ with another dict instance before passing it to the
 application?  I wasn't aware that this was allowed, but if it is, then I see
 the problem.

 The solution would probably be for the application to pass a mutable
 object (e.g. an empty list) to readable/writable that the server could set a
 timeout flag on.


How about a environ['x-wsgiorg.async'].timeout ? I do something like that in
cogen.

-- 
http://ionelmc.wordpress.com
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com