Den lör 13 dec. 2025 kl 16:39 skrev Branko Čibej <[email protected]>:
> On 13. 12. 25 03:35, Branko Čibej wrote: > > On 12. 12. 25 20:49, Branko Čibej wrote: > >> On 12. 12. 25 20:02, Daniel Sahlberg wrote: > >>> Den tors 11 dec. 2025 kl 19:43 skrev Branko Čibej<[email protected]>: > >>>> On 11. 12. 25 19:59, Daniel Sahlberg wrote: > >>>> * Issue SERF-195, which is a substantial rewrite of the request > >>>> queues. > >>>> The > >>>>> code is in a separate branch SERF-195. The code looks good to me > >>>>> but I > >>>>> haven't analyzed in detail. It should be possible to merge to trunk. > >>>> The patch works, in the sense that it causes no regressions, but I > >>>> haven't been able to reproduce the reported issue. It's also makes the > >>>> code slightly more readable, but I couldn't find the mentioned race > >>>> condition during review, it looks like just a refactoring of the > >>>> original code. > >>>> > >>>> I think we need another pair of eyes to do a careful review of the > >>>> branch changes. > >>>> > >>> I'll try to reproduce the initial issue and see if I can figure out > >>> what > >>> goes wrong. Do you have any ideas where I should start? I assume a > >>> multithreaded client (with multiple connections) would be needed? > >> > >> Multi-threading on the client has no effect here. Request queues are > >> per-context and contexts require single-threaded access. > >> > >> The original issue submission just adds a sleep() in > >> serf__setup_request_basic_auth(), so I guess you don't need a > >> multi-threaded client, you just need a server that always requires > >> authentication. That's why, e.g., linking Subversion with that and > >> doing read from the Apache repos won't work. You could try to set up > >> a local multi-threaded http server that requires basic auth and > >> serves some data -- easy to do in Python -- then write a Serf-based > >> client that sends requests in a loop. Serf-get would be a good > >> starting point. > >> > >> I think the important points here are: > >> > >> * all requests to the server must require authentication; > >> * the server must be able to process requests in parallel and out of > >> order: this is the only way I can think of where processed requests > >> could be removed from the middle of the queue, not the beginning or > >> the end; > >> * the server must be "slow" enough that the client request queue fills > >> up, so it needs both an upper limit to the number of parallel > >> requests and a fake latency for the responses. > >> > >> > >> I'll give the server script a go. > > > > See r1930478. > > > > Try something like this: > > > > $ authserver.py 2>/dev/null & > > $ serf_get -n 400 -c 10 -x 20 -U user -P pass -m > > HEADhttp://127.0.0.1:8087/ > > > > > > However, this still doesn't reproduce the reported issue. One possible > > reason is that serf_get does *not* pipeline the requests in each > > connection but only sends the next request when the previous response > > has arrived. I haven't been able to track this down, don't know if > > it's a problem with serf_get or authserver.py. > > I was mistaken about the pipelining. After adding some instrumentation > to serf_get (see below), it turns out that requests are pipelined, and > yet I can't reproduce the issue. > > This is the patch for the basic auth handler, the same as in the > description of SERF-195: > > --- serf-trunk/auth/auth_basic.c (revision 1930477) > +++ serf-trunk/auth/auth_basic.c (working copy) > @@ -19,6 +19,7 @@ > */ > > /*** Basic authentication ***/ > +#include <unistd.h> > > #include <serf.h> > #include <serf_private.h> > @@ -153,6 +154,7 @@ serf__setup_request_basic_auth(const serf__authn_schem > basic_info = authn_info->baton; > > if (basic_info && basic_info->header && basic_info->value) { > + usleep(75000); /* 75 milliseconds */ > serf_bucket_headers_setn(hdrs_bkt, basic_info->header, > basic_info->value); > return APR_SUCCESS; > > > I also noticed something interesting: Each connection sends up to > max-pending requests at once, but then waits for all the replies before > sending the next batch. I haven't checked if this is a bug in serf_get > or in Serf proper; the connection has more requests ready, so it should > send new ones as soon any received response makes room in the pending > requests queue. I saw this with the following invocation: > > serf_get -d -n 500 -c 1 -x 50 -U user -P something -m GEThttp:// > 127.0.0.1:8087/ >/dev/null > > > > The instrumentation patch for serf_get: > > --- serf-trunk/test/serf_get.c (revision 1930478) > +++ serf-trunk/test/serf_get.c (working copy) > @@ -590,6 +590,7 @@ int main(int argc, const char **argv) > apr_getopt_t *opt; > int opt_c; > const char *opt_arg; > + unsigned int prev_min = 0, prev_max = UINT_MAX, prev_all = 0; > > apr_initialize(); > atexit(apr_terminate); > @@ -910,6 +911,24 @@ int main(int argc, const char **argv) > break; > } > /* Debugging purposes only! */ > + if (debug) { > + unsigned int min = UINT_MAX, max = 0, all = 0; > + for (i = 0; i < conn_count; i++) { > + serf_connection_t *conn = connections[i]; > + unsigned int pending = > (serf_connection_pending_requests(conn) > + - > serf_connection_queued_requests(conn)); > + all += pending; > + if (min > pending) min = pending; > + if (max < pending) max = pending; > + } > + if (prev_min != min || prev_max != max || prev_all != all) { > + fprintf(stderr, ">pending: %u [avg=%0.1f min=%u > max=%u]\n", > + all, (double)all/conn_count, min, max); > + prev_min = min; > + prev_max = max; > + prev_all = all; > + } > + } > serf_debug__closed_conn(app_ctx.bkt_alloc); > } > > > -- Brane > With a longer delay in authserver.py (10.0) and a high enough number of concurrent connections (15 seems enough) and high enough outstanding requests (>2), I seem to get an error: [[[ dsg@devi-25-01:~/serf_trunk/test$ ./serf_get -d -n 150 -c 15 -x 5 -U foo -P bar http://127.0.0.1:8087 ... Error running context: (70014) End of file found ]]] With 2 outstanding requests: [[[ dsg@devi-25-01:~/serf_trunk/test$ ./serf_get -d -n 150 -c 15 -x 2 -U foo -P bar http://127.0.0.1:8087 ... 2025-12-13T19:06:06.161609+00 ERROR [l:127.0.0.1:35170 r:127.0.0.1:8087] receiving raw: Error 104 while reading. Error running context: (104) Connection reset by peer ]]] If I add the usleep(), I can't reproduce these errors anymore. It seems to be exactly opposite of what the original poster says - as far as I understand they claim that the error is triggered by adding a usleep? Cheers, Daniel
