I placed the monitors inside of the WSGI script and I am not seeing any 
stacktraces. However, when we begin to see timeouts, I can see that Apache 
begins to start respawning child processes (based on the "Starting stack 
trace monitor" in the logs). Looking at dmesg, I can see that Apache hits 
an out of memory and then kills a process, but I haven't timed those to see 
if they align with the stacktrace messages. What makes this all confusing 
is that during the flood of workers and timeouts of our scripts, I can 
still run individual queries and get back responses. 

On Wednesday, March 16, 2016 at 9:41:23 AM UTC-7, [email protected] 
wrote:
>
> Correct, DataDog. I can add that code into our WSGI scripts in dev and see 
> how it works. Will report back.
>
> On Tuesday, March 15, 2016 at 7:38:59 PM UTC-7, Graham Dumpleton wrote:
>>
>> Okay, is DataDog. Thought it was but first charts found on their web site 
>> didn’t show the legend.
>>
>> On 16 Mar 2016, at 1:36 PM, Graham Dumpleton <[email protected]> 
>> wrote:
>>
>> What is the monitoring system you are using? The UI looks familiar, but 
>> can’t remember what system that is from.
>>
>> How hard would it be for you to add a bit of Python code to the WSGI 
>> script file for your application which starts a background thread that 
>> reports some extra metrics on a periodic basis?
>>
>> Also, the fact that it appears to be backlogged looks a bit like stuck 
>> requests in the Python web application so causing an effect in the Apache 
>> child worker processes as shown by your monitoring. The added metric I am 
>> thinking of would confirm that.
>>
>> A more brute force way of tracking down if requests are getting stuck is 
>> to add to your WSGI script file:
>>
>>
>> http://modwsgi.readthedocs.org/en/develop/user-guides/debugging-techniques.html#extracting-python-stack-traces
>>
>> That way when backlogging occurs and busy workers increases, can force 
>> logging of what Python threads in web application is doing at that point. 
>> If threads are stuck, will tell you where.
>>
>> Graham
>>
>> On 16 Mar 2016, at 1:21 PM, [email protected] wrote:
>>
>> Clarifying on the first line - In our testing, our client is requesting 
>> at 3 requests per second. There could be more, but it should not exceed 6.
>>
>> The request handlers are waiting on a web request that is spawned to 
>> another server which then queries the database. The CPU load is so low it 
>> barely crosses 3% and that's at a high peak. We are typically below 1%.
>>
>> Size of the request payload is small and is merely a simple query, though 
>> requests can vary in size and range from roughly 3KB to 100KB.  
>>
>> Attached is a screenshot of our logging that is capturing 
>> busy/idle/queries on a timeline. Where the yellow line goes to zero and the 
>> workers start to increase is where we begin to see timeouts. The eventual 
>> dip after the peak is me bouncing the apache damon in order to get it back 
>> under some control.
>>
>> On Tuesday, March 15, 2016 at 6:35:13 PM UTC-7, Graham Dumpleton wrote:
>>>
>>>
>>> On 16 Mar 2016, at 12:10 PM, [email protected] wrote:
>>>
>>> I am hoping to gain some clarity here on our WSGI configuration since a 
>>> lot of the tuning seems to be heavily reliant on the application itself. 
>>>
>>> Our setup
>>>
>>>    - Single load balancer (round robin)
>>>    - Two virtual servers with 16GB of RAM
>>>    - Python app ~100MB in memory per process
>>>    - Response times are longer as we broker calls, so it could be up to 
>>>    1-2 seconds
>>>    - Running WSGI 4.4.2 on Ubuntu LTS 14 with Apache 2
>>>    - WSGI Daemon mode running (30 processes with 25 threads)
>>>    - KeepAlives are off
>>>    - WSGI Restrict embedded is on
>>>    - Using MPM event
>>>
>>> For Apache, we have the following:
>>>
>>>    - StartServers 30
>>>    - MinSpareThreads 40
>>>    - MaxSpareThreads 150
>>>    - ThreadsPerChild 25
>>>    - MaxRequestWorkers 600
>>>
>>> I have tried a number of different scenarios, but all of them generally 
>>> lead to the same problem. We are processing about 3 requests a second with 
>>> a steady number of worker threads and plenty of idle in place. After a few 
>>> minutes of sustained traffic, we eventually start timing out which then 
>>> leads to worker counts driving up until it's reached the MaxRequestWorkers. 
>>> Despite this, I am still able to issue requests and get responses, but it 
>>> ultimately leads to apache becoming unresponsive. 
>>>
>>>
>>> Just to confirm. You say that you never go above 3 requests per second, 
>>> but that at worst case those requests can take 2 seconds. Correct?
>>>
>>> Are the request handlers predominantly waiting on backend database 
>>> calls, or are they doing more CPU intensive work? What is the CPU load on 
>>> the mod_wsgi daemon processes?
>>>
>>> Also, what is the size of payloads, for requests and responses?
>>>
>>> Graham
>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
>> <Screen Shot 2016-03-15 at 7.19.45 PM.png>
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to