[modwsgi] Re: cancelling a request if client closes http socket

Matt Craighead Wed, 11 Feb 2009 18:54:18 -0800

My assumption is that all the C modules shipping with Python would have to
support the "with timeout" feature.


At least on Windows Vista and beyond, this wouldn't be so hard -- a call to
CancelSynchronousIo plus a per-thread timeout object and change *WaitFor*
calls to wait on that timeout object (change WaitForSingleObject to
WaitForMultipleObjects as needed).  For OS's before Vista, this is
speculation, but I would guess that CancelSynchronousIo could be implemented
on top of ntdll.dll APIs.  I'm not enough of a Unix expert to say what would
be involved on various Unix systems.

Anyway, obviously this is not going to happen in the forseeable future.


> One has to gauge what are the things that one would
> need to protect against blocking.

Right now I'm having a lot of difficulties coming from the fact that query X
is blocked waiting for query Y to release a mutex -- and that mutex, in
turn, won't be released until query Y finishes doing a bunch of slow network
queries to server Z (which could be out on the public Internet in some
cases).  The general solution is to never hold a mutex while waiting for a
network query that could take a potentially unbounded amount of time --
release the mutex first and reacquire after the network query finishes --
but this requires a fairly substantial rearchitecture of a lot of code.
Some of which is written in Python, some in C/C++, and a good chunk of which
is a file system driver.

I certainly have little to no control over how fast (third-party, often
proprietary and closed-source, often not on the LAN) server Z chooses to
respond to my queries.  I've seen cases where server Z is bogged down
responding to some other random unrelated query, holding a write lock on its
DB, and even the simplest read-only query to server Z hangs for 10 minutes
waiting for that huge unrelated DB query to complete.

Like I said, big complex distributed software system.



On Wed, Feb 11, 2009 at 7:57 PM, Graham Dumpleton <
[email protected]> wrote:

>
> 2009/2/12 Matt Craighead <[email protected]>:
>  > Agree with your terminology corrections.
> >
> > As for Apache on Windows, I have no love for it -- but I have to offer my
> > users *some* sort of Windows-based server solution.  I have my own
> miniature
> > WSGI web server for users who want it to "just work" without having to
> > install anything separate, but some people want more advanced features
> like
> > SSL.
> >
> > As for CGI, it's not a full solution, but you can always kill a process
> > manually, whereas there's no way to kill a specific hung worker thread
> > buried inside a process.  You have to kill the whole process.  Putting on
> a
> > sysadmin hat, I might like be able to kill a single bad request without
> > killing the whole server.  Obviously even better would be to never have
> to
> > kill a request at all, but I'm not sure I see that as 100% realistic in a
> > distributed software system with a lot of diverse components, some of
> which
> > I didn't write and/or which don't have network protocol specs and which I
> > therefore have to access through closed-source third-party API libraries.
> >
> > Writing to the output in a loop is a valid solution for some classes of
> > applications: a gigantic HTML table comes to mind.  I don't see it
> working
> > very well for my particular application.  I could probably make it work
> for
> > a subset of my queries.
> >
> > Timeouts?  Assuming I can catch every single potentially-blocking API
> call
> > and make it time out (fairly challenging when we're talking about a WSGI
> > module that accesses a file system driver that in turn may make network
> > requests to service certain file system calls -- I'm planning to
> rearchitect
> > this slightly but I don't think this part will go away entirely)... I'd
> > better make those timeouts fairly long to avoid false positives.  I think
> > this kind of approach is probably good enough to prevent myself from
> running
> > out of threads and/or memory... *if* I can put a timeout in every single
> > potentially-blocking code path.
> >
> > Last time I looked into this, I think I saw some web debates over whether
> it
> > would be possible to have a language construct along the lines of:
> >
> > with timeout(seconds):
> >     do_stuff()
> >
> > ...where if the contents of the "with" code block took more than
> "seconds"
> > seconds to execute, some sort of TimeoutException would be asynchronously
> > thrown that would allow for a full and orderly cleanup.  This would be
> great
> > for my purposes.  I doubt something like this will find its way into the
> > language any time soon, seeing as I've already found bugs in Python where
> > certain simple blocking socket API calls can't be Ctrl-C'd under Windows.
>
> Except that like trying to inject a Python exception into a running
> thread, it will not work if the code is actually inside of C code and
> that that is where it is hanging.
>
> > Overall, I'm getting the impression that the answer to my dilemma is:
> "Yes,
> > this is hard.  Deal with it."  Would that be a correct assessment?
>
> More or less.  One has to gauge what are the things that one would
> need to protect against blocking. If it is your own internal network
> or systems, probably safe. If going outside of your network, then
> protect yourself.
>
> As failsafe use inactivity-timeout on daemon process group. This is in
> part what it was added for. So even if process hangs to period of
> timeout, that at least it will automatically recover rather than
> having to wait for human to kill it.
>
> Graham
>
> > On Wed, Feb 11, 2009 at 5:47 PM, Graham Dumpleton
> > <[email protected]> wrote:
> >>
> >> 2009/2/12 Matt Craighead <[email protected]>:
> >> > Suppose the client making an HTTP request to my WSGI app closes its
> >> > socket,
> >> > say, because the user hit Escape in their web browser.  What happens
> to
> >> > my
> >> > Python interpreter executing the WSGI code in question?  It keeps
> >> > running,
> >> > right?
> >>
> >> First off, you don't mean 'Python interpreter', you mean 'request
> thread'.
> >>
> >> Python interpreter instances once created within a process survive for
> >> the life of the process. Separate interpreter instances are NOT
> >> created for each request.
> >>
> >> You may well understand this, but it does seem to be a misconception
> >> that some do have, so am clarifying this point.
> >>
> >> As to whether the 'request thread' keeps running, it depends on what
> >> it is doing and how you are yield data from a WSGI application.
> >>
> >> In the simplest case where a WSGI application forms the complete
> >> response as a single string, or list of strings and returns it, the
> >> WSGI application will complete the request regardless. This is because
> >> the data is only being written at the end of the request and so
> >> potentially no earlier point at which it could be detected that the
> >> client had closed the connection.
> >>
> >> If that request thread was performing an operation that took some
> >> time, whether it be computational, or whether it needs to block on the
> >> result from some external process, then whatever it is doing is not
> >> interrupted.
> >>
> >> In the more complicated case where the WSGI application has returned a
> >> generator which yields data in blocks, then there is an attempt to
> >> write data back to the client as the request progresses. Provided that
> >> blocks of data are only generated when asked for, if writing a prior
> >> block resulted in it being detected that client connection had closed,
> >> the mod_wsgi will skip asking for more blocks of data and move
> >> straight on to closing the generator and finalising the request.
> >>
> >> Thus, use of generators and only generating data as it can be sent,
> >> does provide an option to interupt a long running process. This still
> >> doesn't help in situations where a specific request for a block of
> >> data resulted in the application making some blocking call.
> >>
> >> Another case is where the write() function from start_response() is
> >> used. In this case writing back data to client is driven by the WSGI
> >> application rather than a loop within mod_wsgi requesting data from a
> >> generator. Thus, when closure of connection is detected then a Python
> >> exception will be seen by the application and it is up to it as to
> >> what to do. Most applications seem not to handle it and a 500 error
> >> related to unhandled exception results.
> >>
> >> The only other option for detecting that a client has closed the
> >> connection is when application is reading wsgi.input. This again
> >> generates a Python exception which the application would deal with as
> >> appropriate.
> >>
> >> > This seems pretty unfortunate.  Suppose that the implementation of my
> >> > HTTP
> >> > request needs to go out on the network to talk to some other server.
> >> > (Which, in my case, some of them do.)  connect(), send(), recv() can
> all
> >> > take potentially unlimited amounts of time to complete.  They may not
> >> > consume any CPU time while they're blocking, but a thread is just
> >> > sitting
> >> > there doing nothing; and what, if anything, will cause that thread to
> >> > die
> >> > short of killing the Apache process or the WSGI daemon process (if
> any)?
> >> > Leak enough threads and you could run out of memory, deadlock, or
> >> > whatnot.
> >>
> >> If your backend process you are communicating with never returns then
> >> that is a separate issue to the client closing the connection. For
> >> detecting a backend process as never returning you should be
> >> implementing non blocking operations in conjunction with a timeout to
> >> ensure that any processing is completed in the time you expect it to
> >> be.
> >>
> >> Whether loss of the client connection should cause connection to
> >> backend process to be closed really depends on what the application
> >> does. It may be the case that you still need backend task to be
> >> completed regardless, closing the connection to the backend process,
> >> depending on how the service is implemented may be bad as it may cause
> >> backend task to be interrupted and not complete. But then, if you
> >> really need that, then you should be using a persistent message/task
> >> queuing system to ensure requests aren't lost. Overall though, what
> >> should be done can depend on the individual connections to backend
> >> systems and thus should be handled at the application level.
> >>
> >> When using daemon mode of mod_wsgi the only option you really have is
> >> to set inactivity-timeout as a fail safe for all threads in the
> >> process getting into a locked up state because of code which blocks
> >> and never returns. What would happen is that even though all threads
> >> in a process may be handling a request, if none of them actually read
> >> any request input or generate any request output in the specificed
> >> time, then the daemon process would be forcibly shutdown.
> >>
> >> > The behavior I'd think I'd want would be that a closed client socket
> >> > would
> >> > result in a Python KeyboardInterrupt being raised asynchronously
> inside
> >> > my
> >> > WSGI Python interpreter, exactly like Ctrl-C in a normal Python app.
> >> >  Then
> >> > my code would nicely release any DB locks/rollback any pending DB
> >> > transactions as the stack unrolled, and blocking IOs (socket or
> >> > otherwise)
> >> > could be interrupted via a signal (Unix)/IO cancellation
> (Windows)/some
> >> > other mechanism (???).
> >>
> >> My understanding is that this wouldn't necessarily be a safe thing to
> >> do as it would involve injecting an exception into a distinct thread.
> >> I remember seeing some warnings about this at one point, but things
> >> could have changed. Either way, I have looked at it before and wasn't
> >> convinced it was a good idea.
> >>
> >> > mod_wsgi daemon mode seems like a partial solution at best:
> >>
> >> In what respect are you saying that?
> >>
> >> > - daemon mode is not supported on Windows, right?
> >>
> >> And never will be. First because fork() is not supported on Windows,
> >> and second because I don't really regard Windows as a good deployment
> >> platform for Apache.
> >>
> >> > - killing the daemon process (potentially?) kills other requests, not
> >> > just
> >> > the hung request
> >>
> >> Yes, although in the case of inactivity-timeout, all threads would
> >> effectively need to have stalled before it kicked in and killed the
> >> process.
> >>
> >> > And any solution that involves one process per request, well, then we
> >> > might
> >> > as well be back to using CGI rather than WSGI...
> >>
> >> But CGI will not help you with this either. Well, not completely true,
> >> CGI will allow more and more processes to be created, but keep doing
> >> that and it will consume all resources on your machine. You still need
> >> something that is going to kill of stuck processes.
> >>
> >> No other web hosting mechanism for WSGI applications I have seen
> >> really provide a solution either. Some others provide timeouts on
> >> individual requests and will kill processes, but none that I know of
> >> will interject some sort of signal indicating that client connection
> >> has closed. As partly explained above, in Apache at least you can only
> >> know a client connection has closed when you attempt to read data from
> >> it or write data to it. Apache is not event driven and so there is no
> >> select/poll on a client connection such that you could be notified
> >> immediately anyway.
> >>
> >> All you can really do for any system is at your application level try
> >> and implement timeouts on potentially blocking operations to backend
> >> processes and otherwise simply ensure you have allowed enough
> >> processes/threads to handle expected load with some additional
> >> capacity to cope with requests stalling for a while until timeouts
> >> kick in.
> >>
> >> Graham
> >>
> >> 512-772-1834
>  >>
> >> >>
> >
>
> >
>


-- 
Matt Craighead
Founder/CEO, Conifer Systems LLC
http://www.conifersystems.com
512-772-1834

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~----------~----~----~----~------~----~------~--~---

[modwsgi] Re: cancelling a request if client closes http socket

Reply via email to