2009/11/12 Chris Butler <[email protected]>:
>
> Hi,
>
> Just when we thought this problem had gone away, it came back again
> today.
>
> Since last time, our developers have deployed some wsgi middleware
> which logs requests on ingress and egress. After analysing this data,
> we discovered some interesting results. Most of the results have
> matching ingress/egress lines, but we occasionally see requests for
> which there is no egress. In this instance, we always see 25 'ingress'
> lines for a particular process id without egress - the number of
> threads per wsgi daemon process. Interestingly, this only seems to
> happen with the first request that's served by a newly-created daemon
> (i.e. I see no entries for that pid earlier in the log). It also seems
> that when the problem occurs, the first two requests for that process
> were received quite close together (possible race condition?). An
> example from our log:
>
> 2009-11-10 11:36:39,321 956, Process: 23190, Thread: 1098619216,
> 2009-11-10 11:36:39,396 326, Process: 23190, Thread: 1090226512,
> 2009-11-10 11:36:40,877 814, Process: 23190, Thread: 1129126224,
> 2009-11-10 11:36:41,817 236, Process: 23190, Thread: 1137518928,
> 2009-11-10 11:36:46,421 54, Process: 23190, Thread: 1145911632,
> 2009-11-10 11:36:47,854 738, Process: 23190, Thread: 1154304336,
> 2009-11-10 11:36:48,376 250, Process: 23190, Thread: 1162697040,
> 2009-11-10 11:36:49,906 17, Process: 23190, Thread: 1171089744,
> 2009-11-10 11:36:50,105 379, Process: 23190, Thread: 1179482448,
> 2009-11-10 11:36:50,109 325, Process: 23190, Thread: 1196267856,
> 2009-11-10 11:36:50,110 355, Process: 23190, Thread: 1187875152,
> 2009-11-10 11:36:50,124 328, Process: 23190, Thread: 1204660560,
> 2009-11-10 11:36:50,323 374, Process: 23190, Thread: 1213053264,
> 2009-11-10 11:36:50,475 756, Process: 23190, Thread: 1221445968,
> 2009-11-10 11:36:51,715 441, Process: 23190, Thread: 1229838672,
> 2009-11-10 11:36:51,959 310, Process: 23190, Thread: 1238231376,
> 2009-11-10 11:36:52,111 371, Process: 23190, Thread: 1246624080,
> 2009-11-10 11:36:52,967 511, Process: 23190, Thread: 1255016784,
> 2009-11-10 11:36:53,349 24, Process: 23190, Thread: 1263409488,
> 2009-11-10 11:36:53,575 194, Process: 23190, Thread: 1271802192,
> 2009-11-10 11:36:53,992 781, Process: 23190, Thread: 1280194896,
> 2009-11-10 11:36:54,062 488, Process: 23190, Thread: 1288587600,
> 2009-11-10 11:36:54,083 970, Process: 23190, Thread: 1296980304,
> 2009-11-10 11:36:54,852 515, Process: 23190, Thread: 1305373008,
> 2009-11-10 11:36:55,157 844, Process: 23190, Thread: 1313765712,
>
> 2009-11-10 12:11:47,224 250, Process: 25665, Thread: 1125169488,
> 2009-11-10 12:11:47,225 752, Process: 25665, Thread: 1133562192,
> 2009-11-10 12:13:16,837 89, Process: 25665, Thread: 1141954896,
> 2009-11-10 12:17:27,249 463, Process: 25665, Thread: 1150347600,
> 2009-11-10 12:19:46,322 551, Process: 25665, Thread: 1158740304,
> 2009-11-10 12:20:46,237 129, Process: 25665, Thread: 1167133008,
> 2009-11-10 12:23:46,292 474, Process: 25665, Thread: 1175525712,
> 2009-11-10 12:24:09,405 334, Process: 25665, Thread: 1183918416,
> 2009-11-10 12:25:28,205 229, Process: 25665, Thread: 1192311120,
> 2009-11-10 12:28:13,756 798, Process: 25665, Thread: 1200703824,
> 2009-11-10 12:28:46,329 797, Process: 25665, Thread: 1209096528,
> 2009-11-10 12:28:58,265 114, Process: 25665, Thread: 1217489232,
> 2009-11-10 12:29:44,944 673, Process: 25665, Thread: 1225881936,
> 2009-11-10 12:29:46,324 844, Process: 25665, Thread: 1234274640,
> 2009-11-10 12:31:10,453 113, Process: 25665, Thread: 1242667344,
> 2009-11-10 12:32:46,200 669, Process: 25665, Thread: 1251060048,
> 2009-11-10 12:33:34,501 856, Process: 25665, Thread: 1259452752,
> 2009-11-10 12:33:46,217 120, Process: 25665, Thread: 1267845456,
> 2009-11-10 12:36:46,268 712, Process: 25665, Thread: 1276238160,
> 2009-11-10 12:37:46,333 76, Process: 25665, Thread: 1284630864,
> 2009-11-10 12:38:46,176 986, Process: 25665, Thread: 1293023568,
> 2009-11-10 12:39:03,460 774, Process: 25665, Thread: 1301416272,
> 2009-11-10 12:40:46,344 814, Process: 25665, Thread: 1309808976,
> 2009-11-10 12:41:46,160 505, Process: 25665, Thread: 1318201680,
> 2009-11-10 12:42:46,336 252, Process: 25665, Thread: 1326594384,
>
> If it helps, I also captured a gdb backtrace of all threads in one of
> those daemon processes, which I've put here: http://pastebin.com/f65ab45c7
>
> We've reached a bit of a dead-end with our debugging. Does this make
> sense to anyone else? Any ideas what else we can do to track this
> down?

When Python code imports a module, ie.,

  import module

it causes a process wide import system lock to be acquired.

Not that it matters as being locked at C code level by import
infrastructure, but this is publicly exposed to Python code in 'imp'
module as imp.acquire_lock(). Only mention the latter so you can
search documentation about it.

Anyway, while that lock is held by a thread, no other thread can
import a module and will be locked out until first thread completes
the task of importing the module it is importing.

What is happening in your code is that thread 25 has performed such an
import, but one of the modules it is importing is doing non trivial
work as a side effect of the import. Whatever that code is doing, it
is trying to lock some other thread mutex but it is already locked by
something. As a consequence that thread blocks and because it blocks
it never gets to complete the module import and so all the other
threads eventually block when they try and perform an import.

So, it seems you have a multithread locking issue whereby what ever
that latter mutex is, wasn't unlocked properly the previous time it
was used causing application level deadlock.

Note that as the Python GIL is still being used properly, any
background threads should still run so long as they don't perform an
import. Thus, can only suggest running some sort of background thread
that detects somehow when all requests threads deadlocked in
application and then iterates over each thread and dumps a stack
trace. It would just need to make sure it pre imported all the modules
it needs.

Whether you can do this sort of stack trace of Python code from Python
code level I cant remember or whether you can only do it from C code.

As to know if deadlocked, probably only way is to use a WSGI
middleware that tracks when requests entered and exited and if all
entered but don't exit after some set timeout, that you try and dump
stack traces for Python threads in Python code.

So, anyone know how to dump traceback for Python threads in an
application from Python code?

Graham

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to