Re: Reproducing SERF-195 [was: Releasing Serf 1.5.0]

Daniel Sahlberg Sat, 13 Dec 2025 11:15:34 -0800

Den lör 13 dec. 2025 kl 16:39 skrev Branko Čibej <[email protected]>:


> On 13. 12. 25 03:35, Branko Čibej wrote:
> > On 12. 12. 25 20:49, Branko Čibej wrote:
> >> On 12. 12. 25 20:02, Daniel Sahlberg wrote:
> >>> Den tors 11 dec. 2025 kl 19:43 skrev Branko Čibej<[email protected]>:
> >>>> On 11. 12. 25 19:59, Daniel Sahlberg wrote:
> >>>> * Issue SERF-195, which is a substantial rewrite of the request
> >>>> queues.
> >>>> The
> >>>>> code is in a separate branch SERF-195. The code looks good to me
> >>>>> but I
> >>>>> haven't analyzed in detail. It should be possible to merge to trunk.
> >>>> The patch works, in the sense that it causes no regressions, but I
> >>>> haven't been able to reproduce the reported issue. It's also makes the
> >>>> code slightly more readable, but I couldn't find the mentioned race
> >>>> condition during review, it looks like just a refactoring of the
> >>>> original code.
> >>>>
> >>>> I think we need another pair of eyes to do a careful review of the
> >>>> branch changes.
> >>>>
> >>> I'll try to reproduce the initial issue and see if I can figure out
> >>> what
> >>> goes wrong. Do you have any ideas where I should start? I assume a
> >>> multithreaded client (with multiple connections) would be needed?
> >>
> >> Multi-threading on the client has no effect here. Request queues are
> >> per-context and contexts require single-threaded access.
> >>
> >> The original issue submission just adds a sleep() in
> >> serf__setup_request_basic_auth(), so I guess you don't need a
> >> multi-threaded client, you just need a server that always requires
> >> authentication. That's why, e.g., linking Subversion with that and
> >> doing read from the Apache repos won't work. You could try to set up
> >> a local multi-threaded http server that requires basic auth and
> >> serves some data  -- easy to do in Python -- then write a Serf-based
> >> client that sends requests in a loop. Serf-get would be a good
> >> starting point.
> >>
> >> I think the important points here are:
> >>
> >>  * all requests to the server must require authentication;
> >>  * the server must be able to process requests in parallel and out of
> >>    order: this is the only way I can think of where processed requests
> >>    could be removed from the middle of the queue, not the beginning or
> >>    the end;
> >>  * the server must be "slow" enough that the client request queue fills
> >>    up, so it needs both an upper limit to the number of parallel
> >>    requests and a fake latency for the responses.
> >>
> >>
> >> I'll give the server script a go.
> >
> > See r1930478.
> >
> > Try something like this:
> >
> >    $ authserver.py 2>/dev/null &
> >    $ serf_get -n 400 -c 10 -x 20 -U user -P pass -m
> > HEADhttp://127.0.0.1:8087/
> >
> >
> > However, this still doesn't reproduce the reported issue. One possible
> > reason is that serf_get does *not* pipeline the requests in each
> > connection but only sends the next request when the previous response
> > has arrived. I haven't been able to track this down, don't know if
> > it's a problem with serf_get or authserver.py.
>
> I was mistaken about the pipelining. After adding some instrumentation
> to serf_get (see below), it turns out that requests are pipelined, and
> yet I can't reproduce the issue.
>
> This is the patch for the basic auth handler, the same as in the
> description of SERF-195:
>
> --- serf-trunk/auth/auth_basic.c        (revision 1930477)
> +++ serf-trunk/auth/auth_basic.c        (working copy)
> @@ -19,6 +19,7 @@
>    */
>
>   /*** Basic authentication ***/
> +#include <unistd.h>
>
>   #include <serf.h>
>   #include <serf_private.h>
> @@ -153,6 +154,7 @@ serf__setup_request_basic_auth(const serf__authn_schem
>       basic_info = authn_info->baton;
>
>       if (basic_info && basic_info->header && basic_info->value) {
> +        usleep(75000);  /* 75 milliseconds */
>           serf_bucket_headers_setn(hdrs_bkt, basic_info->header,
>                                    basic_info->value);
>           return APR_SUCCESS;
>
>
> I also noticed something interesting: Each connection sends up to
> max-pending requests at once, but then waits for all the replies before
> sending the next batch. I haven't checked if this is a bug in serf_get
> or in Serf proper; the connection has more requests ready, so it should
> send new ones as soon any received response makes room in the pending
> requests queue. I saw this with the following invocation:
>
>     serf_get -d -n 500 -c 1 -x 50 -U user -P something -m GEThttp://
> 127.0.0.1:8087/ >/dev/null
>
>
>
> The instrumentation patch for serf_get:
>
> --- serf-trunk/test/serf_get.c  (revision 1930478)
> +++ serf-trunk/test/serf_get.c  (working copy)
> @@ -590,6 +590,7 @@ int main(int argc, const char **argv)
>       apr_getopt_t *opt;
>       int opt_c;
>       const char *opt_arg;
> +    unsigned int prev_min = 0, prev_max = UINT_MAX, prev_all = 0;
>
>       apr_initialize();
>       atexit(apr_terminate);
> @@ -910,6 +911,24 @@ int main(int argc, const char **argv)
>               break;
>           }
>           /* Debugging purposes only! */
> +        if (debug) {
> +            unsigned int min = UINT_MAX, max = 0, all = 0;
> +            for (i = 0; i < conn_count; i++) {
> +                serf_connection_t *conn = connections[i];
> +                unsigned int pending =
> (serf_connection_pending_requests(conn)
> +                                        -
> serf_connection_queued_requests(conn));
> +                all += pending;
> +                if (min > pending) min = pending;
> +                if (max < pending) max = pending;
> +            }
> +            if (prev_min != min || prev_max != max || prev_all != all) {
> +                fprintf(stderr, ">pending: %u [avg=%0.1f min=%u
> max=%u]\n",
> +                        all, (double)all/conn_count, min, max);
> +                prev_min = min;
> +                prev_max = max;
> +                prev_all = all;
> +            }
> +        }
>           serf_debug__closed_conn(app_ctx.bkt_alloc);
>       }
>
>
> -- Brane
>

With a longer delay in authserver.py (10.0) and a high enough number of
concurrent connections (15 seems enough) and high enough outstanding
requests (>2), I seem to get an error:
[[[
dsg@devi-25-01:~/serf_trunk/test$ ./serf_get -d -n 150 -c 15 -x 5 -U foo -P
bar http://127.0.0.1:8087
...
Error running context: (70014) End of file found
]]]

With 2 outstanding requests:
[[[
dsg@devi-25-01:~/serf_trunk/test$ ./serf_get -d -n 150 -c 15 -x 2 -U foo -P
bar http://127.0.0.1:8087
...
2025-12-13T19:06:06.161609+00 ERROR [l:127.0.0.1:35170 r:127.0.0.1:8087]
receiving raw: Error 104 while reading.
Error running context: (104) Connection reset by peer
]]]

If I add the usleep(), I can't reproduce these errors anymore. It seems to
be exactly opposite of what the original poster says - as far as I
understand they claim that the error is triggered by adding a usleep?

Cheers,
Daniel

Re: Reproducing SERF-195 [was: Releasing Serf 1.5.0]

Reply via email to