On Wed, 14 Nov 2001, Stipe Tolj wrote: > I have reported such hanginf keepalive childs on the Cygwin 1.x > platform. > > These come up after some days of load and ussually go up to 50-60 > "blocked" keepalive childs. Recently (after 16 days httpd uptime) the > whole scoreboard "flushed" and the hanging keepalive processes > disapeared (without restarting apache).
any chance you've got a NAT which times out connections after 15 days? > I thought this was a Cygwin specific problem, but as the PRs report > similiar effect I think this is related to Apache itself. thing is, i never see it on my systems :) maybe there's a pattern to the request previous to blocking -- what you can do is log the PID for each request in your access_log. then when you discover hung children you can backtrack in the logs to see what the previous request was... er actually i guess you can get this from the scoreboard. thing is, there is a race condition in the OPTIMIZE_TIMEOUT code, but it's the opposite problem from what folks are describing -- the race condition can mean a SIGALRM delivered if the child receives data right when the timeout happens. (there's no way around this without resorting to cpu-specific knowledge to implement memory barriers... and i don't particularly think it's important.) if you can reproduce the problem easily you might also want to disable OPTIMIZE_TIMEOUT and see what happens... maybe there's another race i haven't spotted. -dean