Passing request_data wasn't a good idea for certain reasons. I probably just
rediscovered why I didn't do it that way in the first place. :-)
Anyway, provided you use the latest from 'develop' branch of GitHub repo, play
around with the following code.
import os
import sys
import time
import threading
import traceback
import mod_wsgi
def event_handler(name, **kwargs):
if name == 'request_started':
request_data = mod_wsgi.request_data()
request_data.update(kwargs)
request_data['python_thread_id'] = threading.get_ident()
elif name == 'process_stopping':
if kwargs['shutdown_reason'] == 'request_timeout':
print('SHUTDOWN')
current_time = time.time()
stacks = dict(sys._current_frames().items())
active = dict(mod_wsgi.active_requests.items())
for request_id, request_data in active.items():
python_thread_id = request_data['python_thread_id']
application_start = request_data['application_start']
request_environ = request_data['request_environ']
request_uri = request_environ['REQUEST_URI']
running_time = current_time - application_start
print()
print('THREAD_ID', python_thread_id)
print('REQUEST_ID', request_id)
print('REQUEST_URI', request_uri)
print('RUNNING_TIME', running_time)
if python_thread_id in stacks:
print('STACK-TRACE')
traceback.print_stack(stacks[python_thread_id])
mod_wsgi.subscribe_events(event_handler)
def application(environ, start_response):
sleep_duration = environ.get('HTTP_X_SLEEP_DURATION', 0)
sleep_duration = float(sleep_duration or 0)
status = '200 OK'
output = b'Hello World!'
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)
if sleep_duration:
time.sleep(sleep_duration)
yield output
I am running that with:
mod_wsgi-express start-server tests/request-timeout.wsgi --log-to-terminal
--request-timeout 1 --threads 1 --log-level info
and running curl as:
curl -H X-Sleep-Duration:60 http://localhost:8000/ <http://localhost:8000/>
That gets me the following. One stack trace I generate myself just for that sub
interpreter out of event mechanism. The other is process wide generated from C
code level.
[Thu Feb 08 22:52:41.635819 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Daemon process request time limit exceeded, stopping process 'localhost:8000'.
[Thu Feb 08 22:52:41.636053 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Shutdown requested 'localhost:8000'.
[Thu Feb 08 22:52:41.636126 2018] [wsgi:error] [pid 21486] SHUTDOWN
[Thu Feb 08 22:52:41.636149 2018] [wsgi:error] [pid 21486]
[Thu Feb 08 22:52:41.636156 2018] [wsgi:error] [pid 21486] THREAD_ID
123145557110784
[Thu Feb 08 22:52:41.636162 2018] [wsgi:error] [pid 21486] REQUEST_ID
YSp5DBLFVJs
[Thu Feb 08 22:52:41.636166 2018] [wsgi:error] [pid 21486] REQUEST_URI /
[Thu Feb 08 22:52:41.636179 2018] [wsgi:error] [pid 21486] RUNNING_TIME
1.85565185546875
[Thu Feb 08 22:52:41.636187 2018] [wsgi:error] [pid 21486] STACK-TRACE
[Thu Feb 08 22:52:41.819058 2018] [wsgi:error] [pid 21486] File
"/Volumes/graham/Projects/mod_wsgi/tests/request-timeout.wsgi", line 56, in
application
[Thu Feb 08 22:52:41.819071 2018] [wsgi:error] [pid 21486]
time.sleep(sleep_duration)
[Thu Feb 08 22:52:41.819080 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Dumping stack trace for active Python threads.
[Thu Feb 08 22:52:41.819083 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Thread 123145557110784 executing file
"/Volumes/graham/Projects/mod_wsgi/tests/request-timeout.wsgi", line 56, in
application
[Thu Feb 08 22:52:46.636172 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Aborting process 'localhost:8000'.
[Thu Feb 08 22:52:46.636375 2018] [wsgi:info] [pid 21486] mod_wsgi (pid=21486):
Exiting process 'localhost:8000'.
Graham
> On 8 Feb 2018, at 2:44 pm, Graham Dumpleton <[email protected]>
> wrote:
>
> FWIW, the other current events are.
>
> * response_started
> * request_exception
> * process_stopping
>
> Passing the per request data as "request_data" argument to event handler for
> request/response events was easy enough to do.
>
> Getting access to the thread object in order to work out thread.name when
> dumping stack traces is definitely not easy. This is because you only have
> access to the integer thread ID and Python frame object. You would have to
> use the thread ID to look up threading._active and hope it is the same value
> used for that. The problem is though that you don't know what Python
> interpreter the thread was active in and which you need to switch to to do
> that lookup. When you have multiple sub interpreters, the thread could also
> have been used in more than one sub interpreter and associated with multiple
> high level Python thread objects. This is the sort of mess that just gives
> one headaches. I therefore don't think that is practical.
>
> So I don't think it can be done from where that current code gets triggered.
> The point where the "process_stopping" event is generated could instead be
> used. This is different because that is generated for each sub interpreter
> just before destroying it, so you are already executing code in the right
> context.
>
> Right now at least, using the per request events, you could track current
> requests yourself in a global dictionary, with link to thread as part of the
> request data. The "process_stopping" could access that cache to dump things
> out. What you will not know is why the process was being stopped, but that is
> possibly something that can pass as an argument to "process_stopping" so
> could decide what to do based on reason for process shutdown.
>
> What mod_wsgi could do to make it easier is provide a way of getting back a
> dictionary of references, by request ID, to all request_data objects for
> active requests. What you stash in them from the event handlers would be up
> to you. I need to think about the best way to expose that giving access to
> those request data objects.
>
> Graham
>
>> On 8 Feb 2018, at 1:41 pm, Graham Dumpleton <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>
>>
>>> On 8 Feb 2018, at 1:08 pm, Jesus Cea <[email protected] <mailto:[email protected]>>
>>> wrote:
>>>
>>> On 08/02/18 02:39, Graham Dumpleton wrote:
>>>> What are you setting the thread name too?
>>>
>>> Beside the mod_wsgi threads running "application()", my code creates
>>> tons of long term threads like "cache_cleanup",
>>> "periodic_cache_flush_to_disk", "map generation workers", "audio
>>> transcoding", etc.
>>>
>>>> https://github.com/GrahamDumpleton/mod_wsgi/issues/160
>>>> <https://github.com/GrahamDumpleton/mod_wsgi/issues/160>
>>>
>>> Uhm, first thing in "application()" could be a "thread.name=URI", and
>>> the "finally" statement could be "thread.name='idle'", or in the
>>> "close()" code of the iterator returned.
>>>
>>> This looks like a pattern for a near trivial middleware.
>>
>> Using a WSGI middleware for that is a bad idea because of the complexity of
>> implementing a WSGI middleware that properly bounds the full execution of
>> the code involved in all parts of handling the request. The better way is to
>> use the event system in mod_wsgi to be notified of the start and end of the
>> request. This is more efficient and you don't need to wrap the WSGI
>> application entry point.
>>
>> import mod_wsgi
>> import threading
>>
>> def event_handler(name, **kwargs):
>> cache = mod_wsgi.request_data()
>> thread = threading.current_thread()
>>
>> if name == 'request_started':
>> cache['original_thread_name'] = thread.name
>> environ = kwargs['request_environ']
>> thread.name = environ['REQUEST_URI']
>>
>> elif name == 'request_finished':
>> thread.name = cache['original_thread_name']
>>
>> mod_wsgi.subscribe_events(event_handler)
>>
>> I don't think overriding the thread name with a request URI is a good idea
>> here though. I think it would be better to have mod_wsgi set it based on the
>> existing request ID that Apache generates as that then matches what you can
>> log for the request in the access log.
>>
>> Overall what is a probably a better approach is for me to extend the event
>> mechanism in a couple of ways.
>>
>> The first is to add a new event type of 'request_active'. In mod_wsgi I
>> could have a default reporting interval to pick up on long running requests
>> and generate a 'request_active' event every 15 seconds (configurable), so
>> long as the request is running.
>>
>> A second event could be of type 'request_timeout'. This could be triggered
>> for each request, specifically for the case where there are active requests
>> when the process is being shutdown due to request-timeout expiring for the
>> process.
>>
>> From the event handler for either of these you could log any information you
>> want to. The only hard bit for me is that currently the
>> mod_wsgi.request_data() call which provides access to a per request
>> dictionary where you can stash data, is based on using thread locals, so
>> calling that in these two events wouldn't work as it isn't being called from
>> the request thread, but a separate thread. I had refrained from passing it
>> as an explicit argument to the event handler for reasons I couldn't
>> remember. Otherwise need to know when calling mod_wsgi.request_data() that
>> doing it for these special cases and calculate the cache of request data
>> another way based on knowing what request am going working with when
>> triggering the event.
>>
>>>> If you are logging thread ID in access log, then setting thread ID to
>>>> request ID and attaching it to the traceback sounds reasonable.
>>>
>>> I am looking for being able to easily identify my threads in a
>>> "request-timeout" traceback dump when I have like 130 threads running.
>>> They are nicely labeled in my code, but the mod_wsgi traceback dump
>>> doesn't show the "name" field, but opaque and uninformative "thread.ident".
>>>
>>> I have a "futures._WorkItem" overloaded to accept an extra "thread_name"
>>> parameter in the "futures.executor.submit()" so I can annotate threads
>>> doing background work.
>>>
>>> Now that I am using "request-timeout" traceback dumps, I would love to
>>> have all that information available. Just dump "thread.name" if
>>> available, instead of "thread.ident" :-).
>>
>>
>> Graham
>
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.