Re: 2.4.4 graceful restart taking long time [PATCH]

Jeff Trawick Wed, 15 Jan 2014 11:33:03 -0800

On Fri, Apr 12, 2013 at 7:36 AM, Arkadiusz Miśkiewicz <ar...@maven.pl>wrote:

> On Friday 12 of April 2013, Eric Covener wrote:
> > > diff -ur httpd-2.4.4.org/server/mpm_unix.chttpd-2.4.4/server/mpm_unix.c
> > > --- httpd-2.4.4.org/server/mpm_unix.c   2012-07-03 21:38:58.000000000
> > > +0200 +++ httpd-2.4.4/server/mpm_unix.c       2013-04-12
> > > 09:14:58.282929959 +0200 @@ -604,7 +604,17 @@
> > >
> > >          len = strlen(data);
> > >
> > >      }
> > >
> > > -    apr_socket_send(sock, data, &len);
> > > +    rv = apr_socket_send(sock, data, &len);
> > > +    if (rv == APR_SUCCESS) {
> > > +        char *buffer[10]; // dummy buffer to see if child is alive
> > > +
> > > +        len = sizeof(buffer);
> > > +        rv = apr_socket_recv(sock, buffer, &len);
> > > +        if (rv != APR_SUCCESS) {
> > > +            ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, ap_server_conf,
> > > APLOGNO(00057) +                         "receiving response to dummy
> > > data to listener on %pI", lp->bind_addr); +        }
> > > +    }
> > >
> > >      apr_socket_close(sock);
> > >      apr_pool_destroy(p);
>
> End of this is:
>
>     apr_socket_close(sock);
>     apr_pool_destroy(p);
>
>     return rv;
> >
> > Interesting analysis, I'm confused on how the patch helps though if it
> > doesn't actually do anything different when there is no reply (which
> > is more what I expected from the bullets)
>
> It returns error in rv

The implication of the recv was quite confusing to me initially when I
reviewed this again.  Rüdiger had reviewed an strace and determined that
connect() was taking longer and longer over time (presumably as the queue
in the kernel built up).  So with a timeout of 3 seconds you might have
connects that took increasing amounts of time all under 3 seconds that
could add up to extremely long delays.  Add the recv call and you cap the
total delay at essentially 3 seconds since getting connect+recv to work
within the timeout forces there to be a non-zombie, non-busy child on the
other end.

I don't see that Jim's proposal for the 2.4 branch matches Arkadiusz's
exact scenario/PR or suggested fix though I do understand that it could
quicken graceful restart for some scenarios.  I'll remove my note from
2.4.x/STATUS.

> which is propagated up to ap_mpm_pod_killpg
> which does
>     for (i = 0; i < num && rv == APR_SUCCESS; i++) {
>         rv = dummy_connection(pod);
>     }
>
> and on error will break the loop and stop doing dummy connections as there
> are
> no children anyway. graceful restart then can proceed without a delay.
>
> --
> Arkadiusz Miśkiewicz, arekm / maven.pl
>

-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Re: 2.4.4 graceful restart taking long time [PATCH]

Reply via email to