Should add that if it does come down to deadlock in garbage collector, next step is to then use a variant of:
https://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces <https://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces> to dump out stack traces where all threads are and try to work out which is blocked. > On 19 Jun 2020, at 9:48 am, Graham Dumpleton <graham.dumple...@gmail.com> > wrote: > > I haven't read the discussion, but this was posted as part of: > > https://groups.google.com/forum/#!topic/modwsgi/gRGy0JILpsI%5B1-25%5D > <https://groups.google.com/forum/#!topic/modwsgi/gRGy0JILpsI[1-25]> > >> On 19 Jun 2020, at 9:46 am, Graham Dumpleton <graham.dumple...@gmail.com >> <mailto:graham.dumple...@gmail.com>> wrote: >> >> So I found the following code, which reminds me that the issue was that code >> being run when the garbage collector invoked __del__() methods was >> deadlocking. This caused the garbage collector to stop running. The code >> below used a trick with objects to be able to log each time the garbage >> collector runs. So if you see it stops running you know it could be because >> of a deadlock in a __del__() method. >> >> import time >> import threading >> >> class Monitor(object): >> >> initialized = False >> lock = threading.Lock() >> >> count = 0 >> >> @classmethod >> def initialize(cls): >> with Monitor.lock: >> if not cls.initialized: >> cls.initialized = True >> cls.rollover() >> >> @staticmethod >> def rollover(): >> print('RUNNING GARBAGE COLLECTOR', time.time()) >> >> class Object(object): >> pass >> >> o1 = Object() >> o2 = Object() >> >> o1.o = o2 >> o2.o = o1 >> >> o1.t = Monitor() >> >> del o1 >> del o2 >> >> def __del__(self): >> global count >> Monitor.count += 1 >> Monitor.rollover() >> >> Monitor.initialize() >> >> >> >> >>> On 19 Jun 2020, at 9:42 am, Graham Dumpleton <graham.dumple...@gmail.com >>> <mailto:graham.dumple...@gmail.com>> wrote: >>> >>> One possible cause for this can be object reference count cycles which the >>> garbage collector cannot break. >>> >>> So first off, try creating a background thread that periodically logs >>> number of objects. >>> >>> I think it is gc.get_count(). The thresholds of when it should kick in are >>> given by gc.get_threshold(). >>> >>> If need be, you can then start dumping out counts of objects of particular >>> types that exist by looking at gc.get_objects(). >>> >>> Anyway, this may give some clues. Have had to use this many many years ago >>> to debug a memory growth issue in Django due to custom __del__() methods on >>> objects causing problems. My memory of what I did is very vague though and >>> don't think I have any code I used laying around, but will have a quick >>> search. >>> >>> Graham >>> >>>> On 19 Jun 2020, at 6:27 am, Jason Garber <ja...@gahooa.com >>>> <mailto:ja...@gahooa.com>> wrote: >>>> >>>> Hey Grant, All, >>>> >>>> We've been running a live event with about 1,000 people and getting hit >>>> with up to hundreds of requests per second. I'm running 20 processes and >>>> 20 threads per process. >>>> >>>> Every once in a while the memory across all processes spikes up to 200+MB >>>> and the load average skyrockets. I've seen it hit as high as 250 (vs. 0.7 >>>> normal) >>>> >>>> service httpd graceful >>>> fixes the issue (for a while) >>>> >>>> Normal Example: >>>> [deploy@daas7 DaaS-TMT-0]$ ps aux | grep 'apache' | grep 'TMT-0' | awk >>>> '{print $6/1024 " MB " $11}'; cat /proc/meminfo | grep -E >>>> '(MemFree|Avail)'; uptime >>>> 123.969 MB wsgi-DaaS-TMT-0 >>>> 123.984 MB wsgi-DaaS-TMT-0 >>>> 119.52 MB wsgi-DaaS-TMT-0 >>>> 126.121 MB wsgi-DaaS-TMT-0 >>>> 121.086 MB wsgi-DaaS-TMT-0 >>>> 121.016 MB wsgi-DaaS-TMT-0 >>>> 145.945 MB wsgi-DaaS-TMT-0 >>>> 118.406 MB wsgi-DaaS-TMT-0 >>>> 126.672 MB wsgi-DaaS-TMT-0 >>>> 112.234 MB wsgi-DaaS-TMT-0 >>>> 111.328 MB wsgi-DaaS-TMT-0 >>>> 135.461 MB wsgi-DaaS-TMT-0 >>>> 117.73 MB wsgi-DaaS-TMT-0 >>>> 136.438 MB wsgi-DaaS-TMT-0 >>>> 113.359 MB wsgi-DaaS-TMT-0 >>>> 118.289 MB wsgi-DaaS-TMT-0 >>>> 123.535 MB wsgi-DaaS-TMT-0 >>>> 126.746 MB wsgi-DaaS-TMT-0 >>>> 122.766 MB wsgi-DaaS-TMT-0 >>>> 115.934 MB wsgi-DaaS-TMT-0 >>>> MemFree: 4993068 kB >>>> MemAvailable: 25089688 kB >>>> 13:01:36 up 7 days, 9:27, 4 users, load average: 0.55, 0.82, 2.46 >>>> >>>> Server almost unresponsive: >>>> [deploy@daas7 DaaS-TMT-0]$ ps aux | grep 'apache' | grep 'TMT-0' | awk >>>> '{print $6/1024 " MB " $11}'; cat /proc/meminfo | grep -E >>>> '(MemFree|Avail)'; uptime >>>> 275.457 MB wsgi-DaaS-TMT-0 >>>> 277.633 MB wsgi-DaaS-TMT-0 >>>> 274.633 MB wsgi-DaaS-TMT-0 >>>> 285.215 MB wsgi-DaaS-TMT-0 >>>> 278.156 MB wsgi-DaaS-TMT-0 >>>> 272.445 MB wsgi-DaaS-TMT-0 >>>> 277.543 MB wsgi-DaaS-TMT-0 >>>> 274.371 MB wsgi-DaaS-TMT-0 >>>> 277.699 MB wsgi-DaaS-TMT-0 >>>> 273.18 MB wsgi-DaaS-TMT-0 >>>> 273.363 MB wsgi-DaaS-TMT-0 >>>> 278.094 MB wsgi-DaaS-TMT-0 >>>> 276.719 MB wsgi-DaaS-TMT-0 >>>> 277.074 MB wsgi-DaaS-TMT-0 >>>> 274.324 MB wsgi-DaaS-TMT-0 >>>> 275.32 MB wsgi-DaaS-TMT-0 >>>> 273.684 MB wsgi-DaaS-TMT-0 >>>> 271.797 MB wsgi-DaaS-TMT-0 >>>> 283.133 MB wsgi-DaaS-TMT-0 >>>> 255.16 MB wsgi-DaaS-TMT-0 >>>> 28.8008 MB /usr/bin/convert >>>> MemFree: 262352 kB >>>> MemAvailable: 18945328 kB >>>> 13:18:50 up 7 days, 9:44, 4 users, load average: 253.79, 100.74, 40.20 >>>> >>>> After httpd graceful after a couple of minutes: >>>> >>>> [deploy@daas7 DaaS-TMT-0]$ ~/stats.sh >>>> 100.383 MB wsgi-DaaS-TMT-0 >>>> 110.719 MB wsgi-DaaS-TMT-0 >>>> 101.176 MB wsgi-DaaS-TMT-0 >>>> 128.449 MB wsgi-DaaS-TMT-0 >>>> 112.527 MB wsgi-DaaS-TMT-0 >>>> 109.465 MB wsgi-DaaS-TMT-0 >>>> 103.875 MB wsgi-DaaS-TMT-0 >>>> 98.8438 MB wsgi-DaaS-TMT-0 >>>> 108.414 MB wsgi-DaaS-TMT-0 >>>> 108.133 MB wsgi-DaaS-TMT-0 >>>> 107.07 MB wsgi-DaaS-TMT-0 >>>> 118.824 MB wsgi-DaaS-TMT-0 >>>> 101.527 MB wsgi-DaaS-TMT-0 >>>> 127.004 MB wsgi-DaaS-TMT-0 >>>> 100.871 MB wsgi-DaaS-TMT-0 >>>> 125.188 MB wsgi-DaaS-TMT-0 >>>> 100.566 MB wsgi-DaaS-TMT-0 >>>> 108.91 MB wsgi-DaaS-TMT-0 >>>> 101.215 MB wsgi-DaaS-TMT-0 >>>> 109.711 MB wsgi-DaaS-TMT-0 >>>> MemFree: 7607044 kB >>>> MemAvailable: 25815540 kB >>>> 13:25:51 up 7 days, 9:51, 4 users, load average: 1.25, 38.56, 36.12 >>>> >>>> My main question is does anyone have any suggestions for seeing inside the >>>> daemon processes down to the python object level to see what is going on? >>>> >>>> Thanks, >>>> Jason >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "modwsgi" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to modwsgi+unsubscr...@googlegroups.com >>>> <mailto:modwsgi+unsubscr...@googlegroups.com>. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/modwsgi/CAAHBju%2B-w_rF9eEeG_JLL63fgxw34B15VXv5odEcYPPmbbazVA%40mail.gmail.com >>>> >>>> <https://groups.google.com/d/msgid/modwsgi/CAAHBju%2B-w_rF9eEeG_JLL63fgxw34B15VXv5odEcYPPmbbazVA%40mail.gmail.com?utm_medium=email&utm_source=footer>. >>> >> > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/4F286ED2-AFC2-45DD-A215-DD5490AB9630%40gmail.com.