On Fri, Apr 12, 2013 at 7:36 AM, Arkadiusz Miśkiewicz <ar...@maven.pl>wrote:
> On Friday 12 of April 2013, Eric Covener wrote: > > > diff -ur httpd-2.4.4.org/server/mpm_unix.chttpd-2.4.4/server/mpm_unix.c > > > --- httpd-2.4.4.org/server/mpm_unix.c 2012-07-03 21:38:58.000000000 > > > +0200 +++ httpd-2.4.4/server/mpm_unix.c 2013-04-12 > > > 09:14:58.282929959 +0200 @@ -604,7 +604,17 @@ > > > > > > len = strlen(data); > > > > > > } > > > > > > - apr_socket_send(sock, data, &len); > > > + rv = apr_socket_send(sock, data, &len); > > > + if (rv == APR_SUCCESS) { > > > + char *buffer[10]; // dummy buffer to see if child is alive > > > + > > > + len = sizeof(buffer); > > > + rv = apr_socket_recv(sock, buffer, &len); > > > + if (rv != APR_SUCCESS) { > > > + ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, ap_server_conf, > > > APLOGNO(00057) + "receiving response to dummy > > > data to listener on %pI", lp->bind_addr); + } > > > + } > > > > > > apr_socket_close(sock); > > > apr_pool_destroy(p); > > End of this is: > > apr_socket_close(sock); > apr_pool_destroy(p); > > return rv; > > > > Interesting analysis, I'm confused on how the patch helps though if it > > doesn't actually do anything different when there is no reply (which > > is more what I expected from the bullets) > > It returns error in rv The implication of the recv was quite confusing to me initially when I reviewed this again. Rüdiger had reviewed an strace and determined that connect() was taking longer and longer over time (presumably as the queue in the kernel built up). So with a timeout of 3 seconds you might have connects that took increasing amounts of time all under 3 seconds that could add up to extremely long delays. Add the recv call and you cap the total delay at essentially 3 seconds since getting connect+recv to work within the timeout forces there to be a non-zombie, non-busy child on the other end. I don't see that Jim's proposal for the 2.4 branch matches Arkadiusz's exact scenario/PR or suggested fix though I do understand that it could quicken graceful restart for some scenarios. I'll remove my note from 2.4.x/STATUS. > which is propagated up to ap_mpm_pod_killpg > which does > for (i = 0; i < num && rv == APR_SUCCESS; i++) { > rv = dummy_connection(pod); > } > > and on error will break the loop and stop doing dummy connections as there > are > no children anyway. graceful restart then can proceed without a delay. > > -- > Arkadiusz Miśkiewicz, arekm / maven.pl > -- Born in Roswell... married an alien... http://emptyhammock.com/