Passing request_data wasn't a good idea for certain reasons. I probably just
rediscovered why I didn't do it that way in the first place. :-)
Anyway, provided you use the latest from 'develop' branch of GitHub repo, play
around with the following code.
def event_handler(name, **kwargs):
if name == 'request_started':
request_data = mod_wsgi.request_data()
request_data['python_thread_id'] = threading.get_ident()
elif name == 'process_stopping':
if kwargs['shutdown_reason'] == 'request_timeout':
current_time = time.time()
stacks = dict(sys._current_frames().items())
active = dict(mod_wsgi.active_requests.items())
for request_id, request_data in active.items():
python_thread_id = request_data['python_thread_id']
application_start = request_data['application_start']
request_environ = request_data['request_environ']
request_uri = request_environ['REQUEST_URI']
running_time = current_time - application_start
if python_thread_id in stacks:
def application(environ, start_response):
sleep_duration = environ.get('HTTP_X_SLEEP_DURATION', 0)
sleep_duration = float(sleep_duration or 0)
status = '200 OK'
output = b'Hello World!'
response_headers = [('Content-type', 'text/plain'),
I am running that with:
mod_wsgi-express start-server tests/request-timeout.wsgi --log-to-terminal
--request-timeout 1 --threads 1 --log-level info
and running curl as:
curl -H X-Sleep-Duration:60 http://localhost:8000/ <http://localhost:8000/>
That gets me the following. One stack trace I generate myself just for that sub
interpreter out of event mechanism. The other is process wide generated from C
[Thu Feb 08 22:52:41.635819 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Daemon process request time limit exceeded, stopping process 'localhost:8000'.
[Thu Feb 08 22:52:41.636053 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Shutdown requested 'localhost:8000'.
[Thu Feb 08 22:52:41.636126 2018] [wsgi:error] [pid 21486] SHUTDOWN
[Thu Feb 08 22:52:41.636149 2018] [wsgi:error] [pid 21486]
[Thu Feb 08 22:52:41.636156 2018] [wsgi:error] [pid 21486] THREAD_ID
[Thu Feb 08 22:52:41.636162 2018] [wsgi:error] [pid 21486] REQUEST_ID
[Thu Feb 08 22:52:41.636166 2018] [wsgi:error] [pid 21486] REQUEST_URI /
[Thu Feb 08 22:52:41.636179 2018] [wsgi:error] [pid 21486] RUNNING_TIME
[Thu Feb 08 22:52:41.636187 2018] [wsgi:error] [pid 21486] STACK-TRACE
[Thu Feb 08 22:52:41.819058 2018] [wsgi:error] [pid 21486] File
"/Volumes/graham/Projects/mod_wsgi/tests/request-timeout.wsgi", line 56, in
[Thu Feb 08 22:52:41.819071 2018] [wsgi:error] [pid 21486]
[Thu Feb 08 22:52:41.819080 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Dumping stack trace for active Python threads.
[Thu Feb 08 22:52:41.819083 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Thread 123145557110784 executing file
"/Volumes/graham/Projects/mod_wsgi/tests/request-timeout.wsgi", line 56, in
[Thu Feb 08 22:52:46.636172 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Aborting process 'localhost:8000'.
[Thu Feb 08 22:52:46.636375 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Exiting process 'localhost:8000'.
> On 8 Feb 2018, at 2:44 pm, Graham Dumpleton <graham.dumple...@gmail.com>
> FWIW, the other current events are.
> * response_started
> * request_exception
> * process_stopping
> Passing the per request data as "request_data" argument to event handler for
> request/response events was easy enough to do.
> Getting access to the thread object in order to work out thread.name when
> dumping stack traces is definitely not easy. This is because you only have
> access to the integer thread ID and Python frame object. You would have to
> use the thread ID to look up threading._active and hope it is the same value
> used for that. The problem is though that you don't know what Python
> interpreter the thread was active in and which you need to switch to to do
> that lookup. When you have multiple sub interpreters, the thread could also
> have been used in more than one sub interpreter and associated with multiple
> high level Python thread objects. This is the sort of mess that just gives
> one headaches. I therefore don't think that is practical.
> So I don't think it can be done from where that current code gets triggered.
> The point where the "process_stopping" event is generated could instead be
> used. This is different because that is generated for each sub interpreter
> just before destroying it, so you are already executing code in the right
> Right now at least, using the per request events, you could track current
> requests yourself in a global dictionary, with link to thread as part of the
> request data. The "process_stopping" could access that cache to dump things
> out. What you will not know is why the process was being stopped, but that is
> possibly something that can pass as an argument to "process_stopping" so
> could decide what to do based on reason for process shutdown.
> What mod_wsgi could do to make it easier is provide a way of getting back a
> dictionary of references, by request ID, to all request_data objects for
> active requests. What you stash in them from the event handlers would be up
> to you. I need to think about the best way to expose that giving access to
> those request data objects.
>> On 8 Feb 2018, at 1:41 pm, Graham Dumpleton <graham.dumple...@gmail.com
>> <mailto:graham.dumple...@gmail.com>> wrote:
>>> On 8 Feb 2018, at 1:08 pm, Jesus Cea <j...@jcea.es <mailto:j...@jcea.es>>
>>> On 08/02/18 02:39, Graham Dumpleton wrote:
>>>> What are you setting the thread name too?
>>> Beside the mod_wsgi threads running "application()", my code creates
>>> tons of long term threads like "cache_cleanup",
>>> "periodic_cache_flush_to_disk", "map generation workers", "audio
>>> transcoding", etc.
>>> Uhm, first thing in "application()" could be a "thread.name=URI", and
>>> the "finally" statement could be "thread.name='idle'", or in the
>>> "close()" code of the iterator returned.
>>> This looks like a pattern for a near trivial middleware.
>> Using a WSGI middleware for that is a bad idea because of the complexity of
>> implementing a WSGI middleware that properly bounds the full execution of
>> the code involved in all parts of handling the request. The better way is to
>> use the event system in mod_wsgi to be notified of the start and end of the
>> request. This is more efficient and you don't need to wrap the WSGI
>> application entry point.
>> import mod_wsgi
>> import threading
>> def event_handler(name, **kwargs):
>> cache = mod_wsgi.request_data()
>> thread = threading.current_thread()
>> if name == 'request_started':
>> cache['original_thread_name'] = thread.name
>> environ = kwargs['request_environ']
>> thread.name = environ['REQUEST_URI']
>> elif name == 'request_finished':
>> thread.name = cache['original_thread_name']
>> I don't think overriding the thread name with a request URI is a good idea
>> here though. I think it would be better to have mod_wsgi set it based on the
>> existing request ID that Apache generates as that then matches what you can
>> log for the request in the access log.
>> Overall what is a probably a better approach is for me to extend the event
>> mechanism in a couple of ways.
>> The first is to add a new event type of 'request_active'. In mod_wsgi I
>> could have a default reporting interval to pick up on long running requests
>> and generate a 'request_active' event every 15 seconds (configurable), so
>> long as the request is running.
>> A second event could be of type 'request_timeout'. This could be triggered
>> for each request, specifically for the case where there are active requests
>> when the process is being shutdown due to request-timeout expiring for the
>> From the event handler for either of these you could log any information you
>> want to. The only hard bit for me is that currently the
>> mod_wsgi.request_data() call which provides access to a per request
>> dictionary where you can stash data, is based on using thread locals, so
>> calling that in these two events wouldn't work as it isn't being called from
>> the request thread, but a separate thread. I had refrained from passing it
>> as an explicit argument to the event handler for reasons I couldn't
>> remember. Otherwise need to know when calling mod_wsgi.request_data() that
>> doing it for these special cases and calculate the cache of request data
>> another way based on knowing what request am going working with when
>> triggering the event.
>>>> If you are logging thread ID in access log, then setting thread ID to
>>>> request ID and attaching it to the traceback sounds reasonable.
>>> I am looking for being able to easily identify my threads in a
>>> "request-timeout" traceback dump when I have like 130 threads running.
>>> They are nicely labeled in my code, but the mod_wsgi traceback dump
>>> doesn't show the "name" field, but opaque and uninformative "thread.ident".
>>> I have a "futures._WorkItem" overloaded to accept an extra "thread_name"
>>> parameter in the "futures.executor.submit()" so I can annotate threads
>>> doing background work.
>>> Now that I am using "request-timeout" traceback dumps, I would love to
>>> have all that information available. Just dump "thread.name" if
>>> available, instead of "thread.ident" :-).
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
To post to this group, send email to email@example.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.