Re: [modwsgi] mod_wsgi process restarting (not) logging and "faulthandler"

Graham Dumpleton Wed, 07 Feb 2018 19:45:06 -0800

FWIW, the other current events are.

* response_started
* request_exception
* process_stopping


Passing the per request data as "request_data" argument to event handler for 
request/response events was easy enough to do.

Getting access to the thread object in order to work out thread.name when 
dumping stack traces is definitely not easy. This is because you only have 
access to the integer thread ID and Python frame object. You would have to use 
the thread ID to look up threading._active and hope it is the same value used 
for that. The problem is though that you don't know what Python interpreter the 
thread was active in and which you need to switch to to do that lookup. When 
you have multiple sub interpreters, the thread could also have been used in 
more than one sub interpreter and associated with multiple high level Python 
thread objects. This is the sort of mess that just gives one headaches. I 
therefore don't think that is practical.

So I don't think it can be done from where that current code gets triggered. 
The point where the "process_stopping" event is generated could instead be 
used. This is different because that is generated for each sub interpreter just 
before destroying it, so you are already executing code in the right context.

Right now at least, using the per request events, you could track current 
requests yourself in a global dictionary, with link to thread as part of the 
request data. The "process_stopping" could access that cache to dump things 
out. What you will not know is why the process was being stopped, but that is 
possibly something that can pass as an argument to "process_stopping" so could 
decide what to do based on reason for process shutdown.

What mod_wsgi could do to make it easier is provide a way of getting back a 
dictionary of references, by request ID, to all request_data objects for active 
requests. What you stash in them from the event handlers would be up to you. I 
need to think about the best way to expose that giving access to those request 
data objects.

Graham

> On 8 Feb 2018, at 1:41 pm, Graham Dumpleton <[email protected]> 
> wrote:
> 
> 
> 
>> On 8 Feb 2018, at 1:08 pm, Jesus Cea <[email protected] <mailto:[email protected]>> 
>> wrote:
>> 
>> On 08/02/18 02:39, Graham Dumpleton wrote:
>>> What are you setting the thread name too?
>> 
>> Beside the mod_wsgi threads running "application()", my code creates
>> tons of long term threads like "cache_cleanup",
>> "periodic_cache_flush_to_disk", "map generation workers", "audio
>> transcoding", etc.
>> 
>>> https://github.com/GrahamDumpleton/mod_wsgi/issues/160 
>>> <https://github.com/GrahamDumpleton/mod_wsgi/issues/160>
>> 
>> Uhm, first thing in "application()" could be a "thread.name=URI", and
>> the "finally" statement could be "thread.name='idle'", or in the
>> "close()" code of the iterator returned.
>> 
>> This looks like a pattern for a near trivial middleware.
> 
> Using a WSGI middleware for that is a bad idea because of the complexity of 
> implementing a WSGI middleware that properly bounds the full execution of the 
> code involved in all parts of handling the request. The better way is to use 
> the event system in mod_wsgi to be notified of the start and end of the 
> request. This is more efficient and you don't need to wrap the WSGI 
> application entry point.
> 
> import mod_wsgi
> import threading
> 
> def event_handler(name, **kwargs):
>     cache = mod_wsgi.request_data()
>     thread = threading.current_thread()
> 
>     if name == 'request_started':
>         cache['original_thread_name'] = thread.name
>         environ = kwargs['request_environ']
>         thread.name = environ['REQUEST_URI']
> 
>     elif name == 'request_finished':
>         thread.name = cache['original_thread_name']
> 
> mod_wsgi.subscribe_events(event_handler)
> 
> I don't think overriding the thread name with a request URI is a good idea 
> here though. I think it would be better to have mod_wsgi set it based on the 
> existing request ID that Apache generates as that then matches what you can 
> log for the request in the access log.
> 
> Overall what is a probably a better approach is for me to extend the event 
> mechanism in a couple of ways.
> 
> The first is to add a new event type of 'request_active'. In mod_wsgi I could 
> have a default reporting interval to pick up on long running requests and 
> generate a 'request_active' event every 15 seconds (configurable), so long as 
> the request is running.
> 
> A second event could be of type 'request_timeout'. This could be triggered 
> for each request, specifically for the case where there are active requests 
> when the process is being shutdown due to request-timeout expiring for the 
> process.
> 
> From the event handler for either of these you could log any information you 
> want to. The only hard bit for me is that currently the 
> mod_wsgi.request_data() call which provides access to a per request 
> dictionary where you can stash data, is based on using thread locals, so 
> calling that in these two events wouldn't work as it isn't being called from 
> the request thread, but a separate thread. I had refrained from passing it as 
> an explicit argument to the event handler for reasons I couldn't remember. 
> Otherwise need to know when calling mod_wsgi.request_data() that doing it for 
> these special cases and calculate the cache of request data another way based 
> on knowing what request am going working with when triggering the event.
> 
>>> If you are logging thread ID in access log, then setting thread ID to
>>> request ID and attaching it to the traceback sounds reasonable.
>> 
>> I am looking for being able to easily identify my threads in a
>> "request-timeout" traceback dump when I have like 130 threads running.
>> They are nicely labeled in my code, but the mod_wsgi traceback dump
>> doesn't show the "name" field, but opaque and uninformative "thread.ident".
>> 
>> I have a "futures._WorkItem" overloaded to accept an extra "thread_name"
>> parameter in the "futures.executor.submit()" so I can annotate threads
>> doing background work.
>> 
>> Now that I am using "request-timeout" traceback dumps, I would love to
>> have all that information available. Just dump "thread.name" if
>> available, instead of "thread.ident" :-).
> 
> 
> Graham

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] mod_wsgi process restarting (not) logging and "faulthandler"

Reply via email to