2009/2/12 Matt Craighead <[email protected]>: > Suppose the client making an HTTP request to my WSGI app closes its socket, > say, because the user hit Escape in their web browser. What happens to my > Python interpreter executing the WSGI code in question? It keeps running, > right?
First off, you don't mean 'Python interpreter', you mean 'request thread'. Python interpreter instances once created within a process survive for the life of the process. Separate interpreter instances are NOT created for each request. You may well understand this, but it does seem to be a misconception that some do have, so am clarifying this point. As to whether the 'request thread' keeps running, it depends on what it is doing and how you are yield data from a WSGI application. In the simplest case where a WSGI application forms the complete response as a single string, or list of strings and returns it, the WSGI application will complete the request regardless. This is because the data is only being written at the end of the request and so potentially no earlier point at which it could be detected that the client had closed the connection. If that request thread was performing an operation that took some time, whether it be computational, or whether it needs to block on the result from some external process, then whatever it is doing is not interrupted. In the more complicated case where the WSGI application has returned a generator which yields data in blocks, then there is an attempt to write data back to the client as the request progresses. Provided that blocks of data are only generated when asked for, if writing a prior block resulted in it being detected that client connection had closed, the mod_wsgi will skip asking for more blocks of data and move straight on to closing the generator and finalising the request. Thus, use of generators and only generating data as it can be sent, does provide an option to interupt a long running process. This still doesn't help in situations where a specific request for a block of data resulted in the application making some blocking call. Another case is where the write() function from start_response() is used. In this case writing back data to client is driven by the WSGI application rather than a loop within mod_wsgi requesting data from a generator. Thus, when closure of connection is detected then a Python exception will be seen by the application and it is up to it as to what to do. Most applications seem not to handle it and a 500 error related to unhandled exception results. The only other option for detecting that a client has closed the connection is when application is reading wsgi.input. This again generates a Python exception which the application would deal with as appropriate. > This seems pretty unfortunate. Suppose that the implementation of my HTTP > request needs to go out on the network to talk to some other server. > (Which, in my case, some of them do.) connect(), send(), recv() can all > take potentially unlimited amounts of time to complete. They may not > consume any CPU time while they're blocking, but a thread is just sitting > there doing nothing; and what, if anything, will cause that thread to die > short of killing the Apache process or the WSGI daemon process (if any)? > Leak enough threads and you could run out of memory, deadlock, or whatnot. If your backend process you are communicating with never returns then that is a separate issue to the client closing the connection. For detecting a backend process as never returning you should be implementing non blocking operations in conjunction with a timeout to ensure that any processing is completed in the time you expect it to be. Whether loss of the client connection should cause connection to backend process to be closed really depends on what the application does. It may be the case that you still need backend task to be completed regardless, closing the connection to the backend process, depending on how the service is implemented may be bad as it may cause backend task to be interrupted and not complete. But then, if you really need that, then you should be using a persistent message/task queuing system to ensure requests aren't lost. Overall though, what should be done can depend on the individual connections to backend systems and thus should be handled at the application level. When using daemon mode of mod_wsgi the only option you really have is to set inactivity-timeout as a fail safe for all threads in the process getting into a locked up state because of code which blocks and never returns. What would happen is that even though all threads in a process may be handling a request, if none of them actually read any request input or generate any request output in the specificed time, then the daemon process would be forcibly shutdown. > The behavior I'd think I'd want would be that a closed client socket would > result in a Python KeyboardInterrupt being raised asynchronously inside my > WSGI Python interpreter, exactly like Ctrl-C in a normal Python app. Then > my code would nicely release any DB locks/rollback any pending DB > transactions as the stack unrolled, and blocking IOs (socket or otherwise) > could be interrupted via a signal (Unix)/IO cancellation (Windows)/some > other mechanism (???). My understanding is that this wouldn't necessarily be a safe thing to do as it would involve injecting an exception into a distinct thread. I remember seeing some warnings about this at one point, but things could have changed. Either way, I have looked at it before and wasn't convinced it was a good idea. > mod_wsgi daemon mode seems like a partial solution at best: In what respect are you saying that? > - daemon mode is not supported on Windows, right? And never will be. First because fork() is not supported on Windows, and second because I don't really regard Windows as a good deployment platform for Apache. > - killing the daemon process (potentially?) kills other requests, not just > the hung request Yes, although in the case of inactivity-timeout, all threads would effectively need to have stalled before it kicked in and killed the process. > And any solution that involves one process per request, well, then we might > as well be back to using CGI rather than WSGI... But CGI will not help you with this either. Well, not completely true, CGI will allow more and more processes to be created, but keep doing that and it will consume all resources on your machine. You still need something that is going to kill of stuck processes. No other web hosting mechanism for WSGI applications I have seen really provide a solution either. Some others provide timeouts on individual requests and will kill processes, but none that I know of will interject some sort of signal indicating that client connection has closed. As partly explained above, in Apache at least you can only know a client connection has closed when you attempt to read data from it or write data to it. Apache is not event driven and so there is no select/poll on a client connection such that you could be notified immediately anyway. All you can really do for any system is at your application level try and implement timeouts on potentially blocking operations to backend processes and otherwise simply ensure you have allowed enough processes/threads to handle expected load with some additional capacity to cope with requests stalling for a while until timeouts kick in. Graham --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "modwsgi" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en -~----------~----~----~----~------~----~------~--~---
