https://bz.apache.org/bugzilla/show_bug.cgi?id=63169
--- Comment #7 from Joel Self <joels...@gmail.com> --- What's happening here is that the mpm is spawning more processes than the `active_daemons_limit` (`max_workers divided` by workers per process), but then during a graceful restart it is only sending `active_daemons_limit` number of kill characters (per bucket) over the pipe of death. You can see this in line 3114 of server/mpm/event/event.c (httpd 2.4.48): ``` for (i = 0; i < num_buckets; i++) { ap_mpm_podx_killpg(all_buckets[i].pod, active_daemons_limit, AP_MPM_PODX_GRACEFUL); } ``` The reason the mpm is spawning more child processes than `active_daemons_limit` is due to the exponential increasing in child processes created when the server is suddenly hit with a lot of traffic. In event.c `perform_idle_server_maintenance` examines each active process and checks how many active and idle threads there are and marks which slots are free to create a new child process in (there's no process in that slot). Then `perform_idle_server_maintenance` checks if the idle thread count is less than the minimum spare threads (line 2824): ``` else if (idle_thread_count < min_spare_threads / num_buckets) { ``` If the idle thread count is too low it needs to create new child processes to increase its idle thread count. At first it only creates a single child, but before `perform_idle_server_maintenance` returns it doubles the spawn rate, so the next time around perform_idle_server_maintenance will spawn two children (line 2881): ``` else if (retained->idle_spawn_rate[child_bucket] < MAX_SPAWN_RATE / num_buckets) { retained->idle_spawn_rate[child_bucket] *= 2; } ``` If the next time around perform_idle_server_maintenance doesn't need to spawn new children then the rate is returned to 1. However if the next call to `perform_idle_server_maintenance` needs to spawn children it will spawn 2 then increase the spawn rate to 4. This exponential growth continues until the `active_thread_count` is greater than or equal to `max_workers`. However, because of the exponential growth the last batch of child processes spawned may have created way more than the number of processes needed to reach `max_workers` and thus overshoot the 'active_daemons_limit`. Here's a debug log I created to debug this issue: ``` idle_thread_count [0] < min_spare_threads [200], active_thread_count: 675, max_workers: 800 Spawning 16 children, active_daemons: 27, total_daemons: 27, idle_thread_count: 0, min_spare_threads: 200, active_thread_count: 675, max_workers: 800 ``` We're at 0 idle threads. The last time we spawned children we made 8, so this time we'll make 16, but our active threads is 675. At 25 workers per processes, spawning 16 processes will get us to `675 + 16 * 25 = 1075` workers which is well beyond our `max_workers` of 800. Another way to look at it is that our `active_daemons_limit` is 32 and adding 16 new processes to the current 27 gets us to 43 which is 11 more than our max of 32. After we get 43 processes if we then do a graceful restart will only send 32 kill characters over the pipe of death (we only have 1 bucket), so 11 processes will never receive the kill character and will live on, processing requests with the old configuration. We fixed this problem by limiting the amount of newly spawned children plus the `active_daemons` to be less than `active_daemons_limit`. Added at line 2856 of event.c: ``` if (free_length + active_daemons > active_daemons_limit) { free_length = active_daemons_limit - active_daemons; } ``` I've attached to this bug a patch that we used to fix this. Thanks, Joel Self -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org