On Wed, 14 Nov 2001, Stipe Tolj wrote:

> I have reported such hanginf keepalive childs on the Cygwin 1.x
> platform.
>
> These come up after some days of load and ussually go up to 50-60
> "blocked" keepalive childs. Recently (after 16 days httpd uptime) the
> whole scoreboard "flushed" and the hanging keepalive processes
> disapeared (without restarting apache).

any chance you've got a NAT which times out connections after 15 days?

> I thought this was a Cygwin specific problem, but as the PRs report
> similiar effect I think this is related to Apache itself.

thing is, i never see it on my systems :)

maybe there's a pattern to the request previous to blocking -- what you
can do is log the PID for each request in your access_log.  then when you
discover hung children you can backtrack in the logs to see what the
previous request was...  er actually i guess you can get this from the
scoreboard.

thing is, there is a race condition in the OPTIMIZE_TIMEOUT code, but it's
the opposite problem from what folks are describing -- the race condition
can mean a SIGALRM delivered if the child receives data right when the
timeout happens.  (there's no way around this without resorting to
cpu-specific knowledge to implement memory barriers... and i don't
particularly think it's important.)

if you can reproduce the problem easily you might also want to disable
OPTIMIZE_TIMEOUT and see what happens... maybe there's another race i
haven't spotted.

-dean


Reply via email to