Hey Grant, thank you.  I have not seen these issues outside if heavy load
and have run systems for months without even cycling the daemon processes,
so I intend to do the steps you suggested both outside of and under heavy
load.

That being said, do you have any general suggestions for accurately loading
our application for simulating realistic high load while I analyze as per
your excellent comments from earlier?

Thanks!
Jason

On Thu, Jun 18, 2020, 8:17 PM Graham Dumpleton <[email protected]>
wrote:

> Should add that if it does come down to deadlock in garbage collector,
> next step is to then use a variant of:
>
>
> https://modwsgi.readthedocs.io/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces
>
> to dump out stack traces where all threads are and try to work out which
> is blocked.
>
> On 19 Jun 2020, at 9:48 am, Graham Dumpleton <[email protected]>
> wrote:
>
> I haven't read the discussion, but this was posted as part of:
>
> https://groups.google.com/forum/#!topic/modwsgi/gRGy0JILpsI%5B1-25%5D
> <https://groups.google.com/forum/#!topic/modwsgi/gRGy0JILpsI[1-25]>
>
> On 19 Jun 2020, at 9:46 am, Graham Dumpleton <[email protected]>
> wrote:
>
> So I found the following code, which reminds me that the issue was that
> code being run when the garbage collector invoked __del__() methods was
> deadlocking. This caused the garbage collector to stop running. The code
> below used a trick with objects to be able to log each time the garbage
> collector runs. So if you see it stops running you know it could be because
> of a deadlock in a __del__() method.
>
> import time
> import threading
>
> class Monitor(object):
>
>     initialized = False
>     lock = threading.Lock()
>
>     count = 0
>
>     @classmethod
>     def initialize(cls):
>         with Monitor.lock:
>             if not cls.initialized:
>                 cls.initialized = True
>                 cls.rollover()
>
>     @staticmethod
>     def rollover():
>         print('RUNNING GARBAGE COLLECTOR', time.time())
>
>         class Object(object):
>             pass
>
>         o1 = Object()
>         o2 = Object()
>
>         o1.o = o2
>         o2.o = o1
>
>         o1.t = Monitor()
>
>         del o1
>         del o2
>
>     def __del__(self):
>         global count
>         Monitor.count += 1
>         Monitor.rollover()
>
> Monitor.initialize()
>
>
>
>
> On 19 Jun 2020, at 9:42 am, Graham Dumpleton <[email protected]>
> wrote:
>
> One possible cause for this can be object reference count cycles which the
> garbage collector cannot break.
>
> So first off, try creating a background thread that periodically logs
> number of objects.
>
> I think it is gc.get_count(). The thresholds of when it should kick in are
> given by gc.get_threshold().
>
> If need be, you can then start dumping out counts of objects of particular
> types that exist by looking at gc.get_objects().
>
> Anyway, this may give some clues. Have had to use this many many years ago
> to debug a memory growth issue in Django due to custom __del__() methods on
> objects causing problems. My memory of what I did is very vague though and
> don't think I have any code I used laying around, but will have a quick
> search.
>
> Graham
>
> On 19 Jun 2020, at 6:27 am, Jason Garber <[email protected]> wrote:
>
> Hey Grant, All,
>
> We've been running a live event with about 1,000 people and getting hit
> with up to hundreds of requests per second.  I'm running 20 processes and
> 20 threads per process.
>
> Every once in a while the memory across all processes spikes up to 200+MB
> and the load average skyrockets.  I've seen it hit as high as 250 (vs. 0.7
> normal)
>
> service httpd graceful
> fixes the issue (for a while)
>
> *Normal Example:*
> [deploy@daas7 DaaS-TMT-0]$ ps aux | grep 'apache' | grep 'TMT-0' | awk
> '{print $6/1024 " MB " $11}'; cat /proc/meminfo | grep -E
> '(MemFree|Avail)'; uptime
> 123.969 MB wsgi-DaaS-TMT-0
> 123.984 MB wsgi-DaaS-TMT-0
> 119.52 MB wsgi-DaaS-TMT-0
> 126.121 MB wsgi-DaaS-TMT-0
> 121.086 MB wsgi-DaaS-TMT-0
> 121.016 MB wsgi-DaaS-TMT-0
> 145.945 MB wsgi-DaaS-TMT-0
> 118.406 MB wsgi-DaaS-TMT-0
> 126.672 MB wsgi-DaaS-TMT-0
> 112.234 MB wsgi-DaaS-TMT-0
> 111.328 MB wsgi-DaaS-TMT-0
> 135.461 MB wsgi-DaaS-TMT-0
> 117.73 MB wsgi-DaaS-TMT-0
> 136.438 MB wsgi-DaaS-TMT-0
> 113.359 MB wsgi-DaaS-TMT-0
> 118.289 MB wsgi-DaaS-TMT-0
> 123.535 MB wsgi-DaaS-TMT-0
> 126.746 MB wsgi-DaaS-TMT-0
> 122.766 MB wsgi-DaaS-TMT-0
> 115.934 MB wsgi-DaaS-TMT-0
> MemFree:         4993068 kB
> MemAvailable:   25089688 kB
>  13:01:36 up 7 days,  9:27,  4 users,  load average: 0.55, 0.82, 2.46
>
> *Server almost unresponsive:*
> [deploy@daas7 DaaS-TMT-0]$ ps aux | grep 'apache' | grep 'TMT-0' | awk
> '{print $6/1024 " MB " $11}'; cat /proc/meminfo | grep -E
> '(MemFree|Avail)'; uptime
> 275.457 MB wsgi-DaaS-TMT-0
> 277.633 MB wsgi-DaaS-TMT-0
> 274.633 MB wsgi-DaaS-TMT-0
> 285.215 MB wsgi-DaaS-TMT-0
> 278.156 MB wsgi-DaaS-TMT-0
> 272.445 MB wsgi-DaaS-TMT-0
> 277.543 MB wsgi-DaaS-TMT-0
> 274.371 MB wsgi-DaaS-TMT-0
> 277.699 MB wsgi-DaaS-TMT-0
> 273.18 MB wsgi-DaaS-TMT-0
> 273.363 MB wsgi-DaaS-TMT-0
> 278.094 MB wsgi-DaaS-TMT-0
> 276.719 MB wsgi-DaaS-TMT-0
> 277.074 MB wsgi-DaaS-TMT-0
> 274.324 MB wsgi-DaaS-TMT-0
> 275.32 MB wsgi-DaaS-TMT-0
> 273.684 MB wsgi-DaaS-TMT-0
> 271.797 MB wsgi-DaaS-TMT-0
> 283.133 MB wsgi-DaaS-TMT-0
> 255.16 MB wsgi-DaaS-TMT-0
> 28.8008 MB /usr/bin/convert
> MemFree:          262352 kB
> MemAvailable:   18945328 kB
>  13:18:50 up 7 days,  9:44,  4 users,  load average: 253.79, 100.74, 40.20
>
> *After httpd graceful after a couple of minutes:*
>
> [deploy@daas7 DaaS-TMT-0]$ ~/stats.sh
> 100.383 MB wsgi-DaaS-TMT-0
> 110.719 MB wsgi-DaaS-TMT-0
> 101.176 MB wsgi-DaaS-TMT-0
> 128.449 MB wsgi-DaaS-TMT-0
> 112.527 MB wsgi-DaaS-TMT-0
> 109.465 MB wsgi-DaaS-TMT-0
> 103.875 MB wsgi-DaaS-TMT-0
> 98.8438 MB wsgi-DaaS-TMT-0
> 108.414 MB wsgi-DaaS-TMT-0
> 108.133 MB wsgi-DaaS-TMT-0
> 107.07 MB wsgi-DaaS-TMT-0
> 118.824 MB wsgi-DaaS-TMT-0
> 101.527 MB wsgi-DaaS-TMT-0
> 127.004 MB wsgi-DaaS-TMT-0
> 100.871 MB wsgi-DaaS-TMT-0
> 125.188 MB wsgi-DaaS-TMT-0
> 100.566 MB wsgi-DaaS-TMT-0
> 108.91 MB wsgi-DaaS-TMT-0
> 101.215 MB wsgi-DaaS-TMT-0
> 109.711 MB wsgi-DaaS-TMT-0
> MemFree:         7607044 kB
> MemAvailable:   25815540 kB
>  13:25:51 up 7 days,  9:51,  4 users,  load average: 1.25, 38.56, 36.12
>
> My main question is does anyone have any suggestions for seeing inside the
> daemon processes down to the python object level to see what is going on?
>
> Thanks,
> Jason
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/modwsgi/CAAHBju%2B-w_rF9eEeG_JLL63fgxw34B15VXv5odEcYPPmbbazVA%40mail.gmail.com
> <https://groups.google.com/d/msgid/modwsgi/CAAHBju%2B-w_rF9eEeG_JLL63fgxw34B15VXv5odEcYPPmbbazVA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/modwsgi/4F286ED2-AFC2-45DD-A215-DD5490AB9630%40gmail.com
> <https://groups.google.com/d/msgid/modwsgi/4F286ED2-AFC2-45DD-A215-DD5490AB9630%40gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/modwsgi/CAAHBjuKssRwBc%2BTWjk7xgok0VM-Cr5VE61voexWufPM1L_eOeA%40mail.gmail.com.

Reply via email to