So I found the following code, which reminds me that the issue was that code
being run when the garbage collector invoked __del__() methods was deadlocking.
This caused the garbage collector to stop running. The code below used a trick
with objects to be able to log each time the garbage collector runs. So if you
see it stops running you know it could be because of a deadlock in a __del__()
method.
import time
import threading
class Monitor(object):
initialized = False
lock = threading.Lock()
count = 0
@classmethod
def initialize(cls):
with Monitor.lock:
if not cls.initialized:
cls.initialized = True
cls.rollover()
@staticmethod
def rollover():
print('RUNNING GARBAGE COLLECTOR', time.time())
class Object(object):
pass
o1 = Object()
o2 = Object()
o1.o = o2
o2.o = o1
o1.t = Monitor()
del o1
del o2
def __del__(self):
global count
Monitor.count += 1
Monitor.rollover()
Monitor.initialize()
> On 19 Jun 2020, at 9:42 am, Graham Dumpleton <[email protected]>
> wrote:
>
> One possible cause for this can be object reference count cycles which the
> garbage collector cannot break.
>
> So first off, try creating a background thread that periodically logs number
> of objects.
>
> I think it is gc.get_count(). The thresholds of when it should kick in are
> given by gc.get_threshold().
>
> If need be, you can then start dumping out counts of objects of particular
> types that exist by looking at gc.get_objects().
>
> Anyway, this may give some clues. Have had to use this many many years ago to
> debug a memory growth issue in Django due to custom __del__() methods on
> objects causing problems. My memory of what I did is very vague though and
> don't think I have any code I used laying around, but will have a quick
> search.
>
> Graham
>
>> On 19 Jun 2020, at 6:27 am, Jason Garber <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hey Grant, All,
>>
>> We've been running a live event with about 1,000 people and getting hit with
>> up to hundreds of requests per second. I'm running 20 processes and 20
>> threads per process.
>>
>> Every once in a while the memory across all processes spikes up to 200+MB
>> and the load average skyrockets. I've seen it hit as high as 250 (vs. 0.7
>> normal)
>>
>> service httpd graceful
>> fixes the issue (for a while)
>>
>> Normal Example:
>> [deploy@daas7 DaaS-TMT-0]$ ps aux | grep 'apache' | grep 'TMT-0' | awk
>> '{print $6/1024 " MB " $11}'; cat /proc/meminfo | grep -E '(MemFree|Avail)';
>> uptime
>> 123.969 MB wsgi-DaaS-TMT-0
>> 123.984 MB wsgi-DaaS-TMT-0
>> 119.52 MB wsgi-DaaS-TMT-0
>> 126.121 MB wsgi-DaaS-TMT-0
>> 121.086 MB wsgi-DaaS-TMT-0
>> 121.016 MB wsgi-DaaS-TMT-0
>> 145.945 MB wsgi-DaaS-TMT-0
>> 118.406 MB wsgi-DaaS-TMT-0
>> 126.672 MB wsgi-DaaS-TMT-0
>> 112.234 MB wsgi-DaaS-TMT-0
>> 111.328 MB wsgi-DaaS-TMT-0
>> 135.461 MB wsgi-DaaS-TMT-0
>> 117.73 MB wsgi-DaaS-TMT-0
>> 136.438 MB wsgi-DaaS-TMT-0
>> 113.359 MB wsgi-DaaS-TMT-0
>> 118.289 MB wsgi-DaaS-TMT-0
>> 123.535 MB wsgi-DaaS-TMT-0
>> 126.746 MB wsgi-DaaS-TMT-0
>> 122.766 MB wsgi-DaaS-TMT-0
>> 115.934 MB wsgi-DaaS-TMT-0
>> MemFree: 4993068 kB
>> MemAvailable: 25089688 kB
>> 13:01:36 up 7 days, 9:27, 4 users, load average: 0.55, 0.82, 2.46
>>
>> Server almost unresponsive:
>> [deploy@daas7 DaaS-TMT-0]$ ps aux | grep 'apache' | grep 'TMT-0' | awk
>> '{print $6/1024 " MB " $11}'; cat /proc/meminfo | grep -E '(MemFree|Avail)';
>> uptime
>> 275.457 MB wsgi-DaaS-TMT-0
>> 277.633 MB wsgi-DaaS-TMT-0
>> 274.633 MB wsgi-DaaS-TMT-0
>> 285.215 MB wsgi-DaaS-TMT-0
>> 278.156 MB wsgi-DaaS-TMT-0
>> 272.445 MB wsgi-DaaS-TMT-0
>> 277.543 MB wsgi-DaaS-TMT-0
>> 274.371 MB wsgi-DaaS-TMT-0
>> 277.699 MB wsgi-DaaS-TMT-0
>> 273.18 MB wsgi-DaaS-TMT-0
>> 273.363 MB wsgi-DaaS-TMT-0
>> 278.094 MB wsgi-DaaS-TMT-0
>> 276.719 MB wsgi-DaaS-TMT-0
>> 277.074 MB wsgi-DaaS-TMT-0
>> 274.324 MB wsgi-DaaS-TMT-0
>> 275.32 MB wsgi-DaaS-TMT-0
>> 273.684 MB wsgi-DaaS-TMT-0
>> 271.797 MB wsgi-DaaS-TMT-0
>> 283.133 MB wsgi-DaaS-TMT-0
>> 255.16 MB wsgi-DaaS-TMT-0
>> 28.8008 MB /usr/bin/convert
>> MemFree: 262352 kB
>> MemAvailable: 18945328 kB
>> 13:18:50 up 7 days, 9:44, 4 users, load average: 253.79, 100.74, 40.20
>>
>> After httpd graceful after a couple of minutes:
>>
>> [deploy@daas7 DaaS-TMT-0]$ ~/stats.sh
>> 100.383 MB wsgi-DaaS-TMT-0
>> 110.719 MB wsgi-DaaS-TMT-0
>> 101.176 MB wsgi-DaaS-TMT-0
>> 128.449 MB wsgi-DaaS-TMT-0
>> 112.527 MB wsgi-DaaS-TMT-0
>> 109.465 MB wsgi-DaaS-TMT-0
>> 103.875 MB wsgi-DaaS-TMT-0
>> 98.8438 MB wsgi-DaaS-TMT-0
>> 108.414 MB wsgi-DaaS-TMT-0
>> 108.133 MB wsgi-DaaS-TMT-0
>> 107.07 MB wsgi-DaaS-TMT-0
>> 118.824 MB wsgi-DaaS-TMT-0
>> 101.527 MB wsgi-DaaS-TMT-0
>> 127.004 MB wsgi-DaaS-TMT-0
>> 100.871 MB wsgi-DaaS-TMT-0
>> 125.188 MB wsgi-DaaS-TMT-0
>> 100.566 MB wsgi-DaaS-TMT-0
>> 108.91 MB wsgi-DaaS-TMT-0
>> 101.215 MB wsgi-DaaS-TMT-0
>> 109.711 MB wsgi-DaaS-TMT-0
>> MemFree: 7607044 kB
>> MemAvailable: 25815540 kB
>> 13:25:51 up 7 days, 9:51, 4 users, load average: 1.25, 38.56, 36.12
>>
>> My main question is does anyone have any suggestions for seeing inside the
>> daemon processes down to the python object level to see what is going on?
>>
>> Thanks,
>> Jason
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected]
>> <mailto:[email protected]>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/modwsgi/CAAHBju%2B-w_rF9eEeG_JLL63fgxw34B15VXv5odEcYPPmbbazVA%40mail.gmail.com
>>
>> <https://groups.google.com/d/msgid/modwsgi/CAAHBju%2B-w_rF9eEeG_JLL63fgxw34B15VXv5odEcYPPmbbazVA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/modwsgi/CCE73EFB-FE19-4887-B51D-60E47F73E9BE%40gmail.com.