https://bz.apache.org/bugzilla/show_bug.cgi?id=62318
--- Comment #1 from WJCarpenter <bill-apa...@carpenter.org> --- I recently hit this problem, and I think I understand why it happens. I did a simple proof-of-concept change to validate my theory. Although I was also working with 2.4.33, I don't see any recent changes to suggest it's a new bug. BTW, I saw this on a Windows 10 Enterprise machine. The health checks are done via callbacks from the watchdog timer module. The callback is scheduled to run every AP_WD_TM_SLICE interval. On my server, that's 100 milliseconds (I didn't investigate whether it's 100ms for everyone). When triggered, the callback iterates over all of the configured health checks to see which need attention. And, for any that are "due", the health check is performed. In mod_proxy_hchceck.c, in function hc_watchdog_callback(), near line 928 (in 2.4.33), there is this "if" condition: if (!PROXY_WORKER_IS(worker, PROXY_WORKER_STOPPED) && (worker->s->method != NONE) && (now > worker->s->updated + worker->s->interval)) { The last part is intended to mean "if the HC interval has passed since the last time we checked, then do a health check". If it is "due", according to that "if" condition, the actual health check is kicked off in function hc_check(), near line 811 (in 2.4.33). It makes the TCP or HTTP health check call made, the results evaluated, and the "updated" timestamp is updated. ==================== That's what the code does. Now, here is a description of the problem as I understand it: 1. It's kind of icky to be polling every 100 ms compared to what would be a normal health check interval [default is 60 s]. But I don't know if there is a wiser way to do it in httpd. 2. The "if" condition and surrounding logic don't keep track of whether there is already a health check in progress for a given target. I suppose there could be a reason to allow multiple requests (hung connections?), but in any case it at least contributes to this problem. 3. The "update" timestamp is updated after the health check is completed, but it is updated with a timestamp from when the health check began. In other words, if it takes 1000 ms for a health check call, the "updated" timestamp will eventually be updated with a time 1000 ms in the past. The points just mentioned conspire to keep the health check "due" on every 100 ms watchdog callback if the individual health checks take longer than 100 ms to complete. Suppose, for example, each health check for a given target takes 1000 ms. Then there will be 9 or 10 in-progress health checks for that target. As each finishes, it will mark the "updated" timestamp to 1000 ms in the past. The situation will never resolve until the elapsed time for the health checks drops below 100 ms. That's made less likely by the additional health checks contributing to the load. ==================== To test the theory, I locally modified the hc_check() function to modify the "updated" timestamp with the time the health check finishes instead of the time it started. That produced the desired behavior. I have not examined that solution for unexpected consequences. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org