Been trying to catch up on other stuff the last few days, which is why this
response is delayed.
Over the years have seen a number of times people doing this exact same thing
as you are. That is, doing image manipulation on an uploaded image and then
returning the result. For one reason or another the final outcome seemed to
always be that you are better off using a backend queuing system such as Celery
to handle the image manipulation. In other words, remove the processing of the
images from your web application processes.
There are a few reasons why this is the case.
The first is that images and image manipulation can use a lot of transient
memory. Especially when using multithreading in your web application with
Python, this can result in high peak memory usage for the process. This is
because you might get a whole bunch of requests come in at the same time and so
processing of them overlaps. The memory consumption will blow out to the
maximum required to support that number all being processed at the same time.
When done, although the memory is released back for use by other parts of the
application, the damage has already been done and your application will keep
the overall high memory reservation. End result is that most of the time you
will have lots of unused memory held by the process, with it only being used
when you get concurrent requests again.
The second problem is that image manipulation can be CPU intensive. In a
multithreaded application, depending on how well the image manipulation library
works and how it handles locking of the global interpreter lock, in worst case
parts of that image processing will be forced to be serialised resulting in
requests being blocked and time taken for requests being longer than it would
if processes were single threaded. In other words, image manipulation done in
different threads interfere with each other and they all suffer.
The third is that if using embedded mode of mod_wsgi, you can see problems with
per request thread pool usage of Apache worker process (in which the Python
code is running), blowing out due to large response sizes. In the old days of
Apache, up to 8MB could be held in the per request thread memory pool and only
memory above that limit would actually be released. Thus if have lot of threads
per worker process, that means 8MB of memory that stays reserved for each
worker thread. In more recent Apache versions the sample configuration that
comes with Apache drops this to 2MB, but if the distro has removed that setting
from original Apache sample configuration, or you remove it, then I believe it
defaults back to 8MB.
Using a backend Celery task system avoids the first two issues as work is done
in a separate process and that process could even be recycled after every task,
so you avoid problem with unused memory hanging around being reserved. The
Celery processes are also single threaded, eliminating Python global
interpreter lock issues.
The third problem above can be lessened by ensuring the Apache configuration
directive for setting per request memory pool size is actually set, and lower
the value if necessary. How you configure the Apache MPM settings can also
affect this.
In general though, it is always recommended as first option that you avoid
using modwsgi embedded mode at all, and use daemon mode. This avoids various
problems caused by Apache MPM choice and settings.
So if you can change to Celery in the short term, switch to daemon mode instead.
In doing this, ensure that embedded mode is disabled completely by setting:
WSGIRestrictEmbedded On
Also reduce the per request thread pool size. Where Apache worker processes are
only acting as proxy to mod_wsgi daemon process, the value I set in
mod_wsgi-express configuration is:
ThreadStackSize 262144
Thus 0.25MB per thread instead of 2MB or 8MB.
Another dangerous setting you were using that would have caused lots of
problems when using embedded mode was:
MaxKeepAliveRequests 100
This would be causing Apache to restart your application processes too
frequently, causing higher CPU due to high start up cost. In mod_wsgi-express I
don't set this at all.
Next problem is:
KeepAliveTimeout 45
In mod_wsgi-express, I set this to 2 seconds. By having such a high value you
risk problems, especially when using worker MPM, although event MPM can have
its own issues. By having it lower, you may not need as many Apache worker
processes and threads.
The question now is why you were restarting after 100 requests. Was this in
attempt to try and keep memory usage down?
One of the consequences of this is that would possibly see a lot of interrupted
requests. This is what those warning messages about killing off processes is
about. This is because Apache will only wait so long for processes to shutdown.
Depending on how shutdown is managed, this can be only 5 seconds, but since you
have long running requests, that can prevent that so Apache kills the processes
anyway, and thus why requests can be interrupted. You really want to avoid
periodic restarts of Apache child worker processes using that option.
If you do have a growing memory problem because of issues with your application
code, there are various ways you can trigger restarts of the mod_wsgi daemon
processes, but these self initiated restarts allow for a graceful restart
timeout. Thus for the WSGIDaemonProcess directive you can set the options:
maximum-requests=100 graceful-timeout=120
So when 100 requests arrive, a restart of the process will be signalled, but
since the graceful timeout is set to 120 seconds, it will only be forcibly
restarted after 120 seconds. In the interim, if the number of active requests
being handled by the process drops to 0, a restart will be triggered at that
point. This way it limits interrupting of active requests. You will still have
issues though if have issues with requests getting blocked indefinitely as
never reaches point where no active requests, but then if that is occurring any
why are restarting so frequently, you have bigger issues.
For the latter, if you are getting stuck requests, you want to look at
request-timeout option to WSGIDaemonProcess.
Anyway, for further guidance on setting up mod_wsgi daemon mode, would suggest
watching:
https://www.youtube.com/watch?v=H6Q3l11fjU0
<https://www.youtube.com/watch?v=H6Q3l11fjU0>
The defaults for mod_wsgi daemon mode are not the best options for historical
reasons. The video talks about that and how mod_wsgi-express sets different
defaults.
To start with that is probably all I can suggest. Giving recommendations on
tuning Apache MPM settings and mod_wsgi daemon mode is harder to do at this
point.
Summarising things. Use Celery as out of process means to handle image
manipulation. If you can't do that for now, try and switch to mod_wsgi daemon
mode as that will allow memory and CPU usage to be better controlled.
Graham
> On 7 Dec 2020, at 6:39 pm, Zohaib Ahmed Hassan <[email protected]>
> wrote:
>
> I also get this issue sometime
> [Mon Dec 07 07:04:22.142767 2020] [core:warn] [pid 1836:tid 139752646228928]
> AH00045: child process 2807 still did not exit, sending a SIGTERM
> [Mon Dec 07 07:04:24.144831 2020] [core:warn] [pid 1836:tid 139752646228928]
> AH00045: child process 1847 still did not exit, sending a SIGTERM
> [Mon Dec 07 07:04:24.144875 2020] [core:warn] [pid 1836:tid 139752646228928]
> AH00045: child process 2807 still did not exit, sending a SIGTERM
> [Mon Dec 07 07:04:26.146928 2020] [core:warn] [pid 1836:tid 139752646228928]
> AH00045: child process 1847 still did not exit, sending a SIGTERM
> [Mon Dec 07 07:04:26.146967 2020] [core:warn] [pid 1836:tid 139752646228928]
> AH00045: child process 2807 still did not exit, sending a SIGTERM
> [Mon Dec 07 07:04:28.149026 2020] [core:error] [pid 1836:tid 139752646228928]
> AH00046: child process 1847 still did not exit, sending a SIGKILL
> [Mon Dec 07 07:04:28.149092 2020] [core:error] [pid 1836:tid 139752646228928]
> AH00046: child process 2807 still did not exit, sending a SIGKILL
>
> On Sunday, December 6, 2020 at 7:51:28 AM UTC+5 Zohaib Ahmed Hassan wrote:
> I don't know how much throughput is per second. It's random in the peak
> hours it is 5req/sec but the below chart can help you understand the requests
> throughput as well.
> one more thing I have tested 300 Concurrent requests with it using a script
> and it works well with it but after requests completed the memory usage stays
> at peek like before if it was 20 percent and with concurrent requests it went
> to 45 percent it stays at 45 ..
>
> Zohaib Ahmed Hassan | Senior DevOps Engineer
>
> Direct: +923045060007 <tel:+92%20304%205060007>
> [email protected] <applewebdata://8320B3FE-56F8-4822-AAD0-89D62A3B1952>
> www.xiqinc.com <http://www.northbaysolutions.com/>
>
>
>
> On Sat, Dec 5, 2020 at 4:58 PM Graham Dumpleton <[email protected]
> <applewebdata://8320B3FE-56F8-4822-AAD0-89D62A3B1952>> wrote:
> What about request throughput? That is, requests/sec it current handles, and
> how many concurrent requests at a time.
>
> Graham
>
>> On 5 Dec 2020, at 8:35 pm, Zohaib Ahmed Hassan <[email protected]
>> <applewebdata://8320B3FE-56F8-4822-AAD0-89D62A3B1952>> wrote:
>>
>> Thanks for the response here are details
>> 1 mod_wsgi version is 4.5.7
>> 2 its used as embedded mode
>> 3 basically this app get images in request and crop those images and return
>> , average time it took is around 3 to 5 seconds
>>
>> On Sat, Dec 5, 2020 at 10:22 AM Graham Dumpleton <[email protected]
>> <applewebdata://8320B3FE-56F8-4822-AAD0-89D62A3B1952>> wrote:
>> Also, in addition to what I already asked, what version of mod_wsgi is being
>> used?
>>
>> Graham
>>
>>> On 5 Dec 2020, at 4:18 pm, Graham Dumpleton <[email protected]
>>> <applewebdata://8320B3FE-56F8-4822-AAD0-89D62A3B1952>> wrote:
>>>
>>> What is the mod_wsgi part of the Apache configuration?
>>>
>>> Need to know if you are using embedded mode or daemon mode and how it is
>>> set up.
>>>
>>> Also, what is the request throughput to the Django application and what is
>>> average and worst case response times?
>>>
>>> Graham
>>>
>>>> On 5 Dec 2020, at 3:19 pm, Zohaib Ahmed Hassan <[email protected]
>>>> <applewebdata://8320B3FE-56F8-4822-AAD0-89D62A3B1952>> wrote:
>>>>
>>>> We have an ec2 instance 4vcpu and 16gb of ram which is running Apache
>>>> server with mpm event behind an aws ELB (application load balancer). This
>>>> server serve just Images requested by our other applications although for
>>>> most of application we are uasing cloudfront for caching but one app is
>>>> directly sending request on server . Now Apache memory usage reached to
>>>> 70% every day but it did not come down we have to restart server every
>>>> time. Earier will old Apache 2.2 version and worker mpm without load
>>>> balncer we were not having this issue. I have tried different
>>>> configuration for MPM EVENT and Apache but its not working. Here is
>>>> apache2.conf
>>>>
>>>>
>>>> Timeout 120 # also tried the timeout with 300
>>>> KeepAlive On
>>>> MaxKeepAliveRequests 100
>>>> KeepAliveTimeout 45 # varies this setting from 1 seconds to 300
>>>>
>>>>
>>>> Here is load balancer setting
>>>>
>>>> - Http and https listener
>>>>
>>>> - Idle timeout is 30
>>>>
>>>> Mpm event
>>>>
>>>> <IfModule mpm_event_module>
>>>> StartServers 2
>>>> MinSpareThreads 50
>>>> MaxSpareThreads 75
>>>> ThreadLimit 64
>>>> #ServerLimit 400
>>>> ThreadsPerChild 25
>>>> MaxRequestWorkers 400
>>>> MaxConnectionsPerChild 10000
>>>> </IfModule>
>>>>
>>>> 1. When i change MaxRequestWorkers to 150 with MaxConnectionsPerChild 0
>>>> and ram usage reached 47 percent system health checks are failed and new
>>>> instance is launched by auto scaling group. Seems like worker limit is
>>>> reached which already happend when this instance was working with 8GB Ram.
>>>> 2. Our other server which are just running with simple django site and
>>>> django rest frame apis are working fine with default values for MPM and
>>>> apache configured on installation.
>>>> 3. I have also tried the configuration with KeepAliveTimeout equals to 2,
>>>> 3 and 5 seconds as well but it did not work .
>>>> 4. I have also follow this link [enter link description here][1] it
>>>> worked somewhat better but memory usage is not coming down.
>>>>
>>>> here is the recent error log
>>>>
>>>> [Fri Dec 04 07:45:21.963290 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:22.964362 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:23.965432 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:24.966485 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:25.967281 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:26.968328 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:27.969392 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:28.970449 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:29.971505 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:30.972548 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:31.973593 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:32.974644 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:33.975697 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:34.976753 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>> [Fri Dec 04 07:45:35.977818 2020] [mpm_event:error] [pid 5232:tid
>>>> 139782245895104] AH03490: scoreboard is full, not at
>>>> MaxRequestWorkers.Increase ServerLimit.
>>>>
>>>> top command result
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 3296 www-data 20 0 3300484 469824 58268 S 0.0 2.9 0:46.46
>>>> apache2
>>>> 2544 www-data 20 0 3359744 453868 58292 S 0.0 2.8 1:24.53
>>>> apache2
>>>> 1708 www-data 20 0 3357172 453524 58208 S 0.0 2.8 1:02.85
>>>> apache2
>>>> 569 www-data 20 0 3290880 444320 57644 S 0.0 2.8 0:37.53
>>>> apache2
>>>> 3655 www-data 20 0 3346908 440596 58116 S 0.0 2.7 1:03.54
>>>> apache2
>>>> 2369 www-data 20 0 3290136 428708 58236 S 0.0 2.7 0:35.74
>>>> apache2
>>>> 3589 www-data 20 0 3291032 382260 58296 S 0.0 2.4 0:50.07
>>>> apache2
>>>> 4298 www-data 20 0 3151764 372304 59160 S 0.0 2.3 0:18.95
>>>> apache2
>>>> 4523 www-data 20 0 3140640 310656 58032 S 0.0 1.9 0:07.58
>>>> apache2
>>>> 4623 www-data 20 0 3139988 242640 57332 S 3.0 1.5 0:03.51
>>>> apache2
>>>>
>>>> What is wrong in the configuration that is causing high memory?
>>>>
>>>>
>>>> [1]:
>>>> https://aws.amazon.com/premiumsupport/knowledge-center/apache-backend-elb/
>>>> <https://aws.amazon.com/premiumsupport/knowledge-center/apache-backend-elb/>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "modwsgi" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to [email protected]
>>>> <applewebdata://8320B3FE-56F8-4822-AAD0-89D62A3B1952>.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/modwsgi/f10fbec6-d2b9-4486-a63b-e1fe80f45ddbn%40googlegroups.com
>>>>
>>>> <https://groups.google.com/d/msgid/modwsgi/f10fbec6-d2b9-4486-a63b-e1fe80f45ddbn%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>>
>>
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "modwsgi" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/modwsgi/9FMOGfbmTgg/unsubscribe
>> <https://groups.google.com/d/topic/modwsgi/9FMOGfbmTgg/unsubscribe>.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected]
>> <applewebdata://8320B3FE-56F8-4822-AAD0-89D62A3B1952>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/modwsgi/B006AA7E-E305-4358-AD4A-4EDBE229FA5C%40gmail.com
>>
>> <https://groups.google.com/d/msgid/modwsgi/B006AA7E-E305-4358-AD4A-4EDBE229FA5C%40gmail.com?utm_medium=email&utm_source=footer>.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected]
>> <applewebdata://8320B3FE-56F8-4822-AAD0-89D62A3B1952>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/modwsgi/CAGxhakKtpo2rRNFy0tLiOaykKep4187nPjZnxJnvLnXeFSxa%3Dg%40mail.gmail.com
>>
>> <https://groups.google.com/d/msgid/modwsgi/CAGxhakKtpo2rRNFy0tLiOaykKep4187nPjZnxJnvLnXeFSxa%3Dg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
>
> --
> You received this message because you are subscribed to a topic in the Google
> Groups "modwsgi" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/modwsgi/9FMOGfbmTgg/unsubscribe
> <https://groups.google.com/d/topic/modwsgi/9FMOGfbmTgg/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to
> [email protected]
> <applewebdata://8320B3FE-56F8-4822-AAD0-89D62A3B1952>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/modwsgi/2D7540A0-82BD-4A02-9FF6-459E92A04FCA%40gmail.com
>
> <https://groups.google.com/d/msgid/modwsgi/2D7540A0-82BD-4A02-9FF6-459E92A04FCA%40gmail.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected]
> <mailto:[email protected]>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/modwsgi/1fb1b7b2-32fc-435f-b952-2730aad964e2n%40googlegroups.com
>
> <https://groups.google.com/d/msgid/modwsgi/1fb1b7b2-32fc-435f-b952-2730aad964e2n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/modwsgi/8D3986B0-B4CE-4A4B-9951-C899A2B65271%40gmail.com.