Re: Reproducing SERF-195 [was: Releasing Serf 1.5.0]

Branko Čibej Sat, 13 Dec 2025 12:43:24 -0800

On 13. 12. 25 20:14, Daniel Sahlberg wrote:

Den lör 13 dec. 2025 kl 16:39 skrev Branko Čibej<[email protected]>:

On 13. 12. 25 03:35, Branko Čibej wrote:

On 12. 12. 25 20:49, Branko Čibej wrote:

On 12. 12. 25 20:02, Daniel Sahlberg wrote:

Den tors 11 dec. 2025 kl 19:43 skrev Branko Čibej<[email protected]>:

On 11. 12. 25 19:59, Daniel Sahlberg wrote:
* Issue SERF-195, which is a substantial rewrite of the request
queues.
The

code is in a separate branch SERF-195. The code looks good to me
but I
haven't analyzed in detail. It should be possible to merge to trunk.

The patch works, in the sense that it causes no regressions, but I
haven't been able to reproduce the reported issue. It's also makes the
code slightly more readable, but I couldn't find the mentioned race
condition during review, it looks like just a refactoring of the
original code.

I think we need another pair of eyes to do a careful review of the
branch changes.

I'll try to reproduce the initial issue and see if I can figure out
what
goes wrong. Do you have any ideas where I should start? I assume a
multithreaded client (with multiple connections) would be needed?

Multi-threading on the client has no effect here. Request queues are
per-context and contexts require single-threaded access.

The original issue submission just adds a sleep() in
serf__setup_request_basic_auth(), so I guess you don't need a
multi-threaded client, you just need a server that always requires
authentication. That's why, e.g., linking Subversion with that and
doing read from the Apache repos won't work. You could try to set up
a local multi-threaded http server that requires basic auth and
serves some data  -- easy to do in Python -- then write a Serf-based
client that sends requests in a loop. Serf-get would be a good
starting point.

I think the important points here are:

  * all requests to the server must require authentication;
  * the server must be able to process requests in parallel and out of
    order: this is the only way I can think of where processed requests
    could be removed from the middle of the queue, not the beginning or
    the end;
  * the server must be "slow" enough that the client request queue fills
    up, so it needs both an upper limit to the number of parallel
    requests and a fake latency for the responses.


I'll give the server script a go.

See r1930478.

Try something like this:

    $ authserver.py 2>/dev/null &
    $ serf_get -n 400 -c 10 -x 20 -U user -P pass -m
HEADhttp://127.0.0.1:8087/


However, this still doesn't reproduce the reported issue. One possible
reason is that serf_get does *not* pipeline the requests in each
connection but only sends the next request when the previous response
has arrived. I haven't been able to track this down, don't know if
it's a problem with serf_get or authserver.py.

I was mistaken about the pipelining. After adding some instrumentation
to serf_get (see below), it turns out that requests are pipelined, and
yet I can't reproduce the issue.

This is the patch for the basic auth handler, the same as in the
description of SERF-195:

--- serf-trunk/auth/auth_basic.c        (revision 1930477)
+++ serf-trunk/auth/auth_basic.c        (working copy)
@@ -19,6 +19,7 @@
    */

   /*** Basic authentication ***/
+#include <unistd.h>

   #include <serf.h>
   #include <serf_private.h>
@@ -153,6 +154,7 @@ serf__setup_request_basic_auth(const serf__authn_schem
       basic_info = authn_info->baton;

       if (basic_info && basic_info->header && basic_info->value) {
+        usleep(75000);  /* 75 milliseconds */
           serf_bucket_headers_setn(hdrs_bkt, basic_info->header,
                                    basic_info->value);
           return APR_SUCCESS;


I also noticed something interesting: Each connection sends up to
max-pending requests at once, but then waits for all the replies before
sending the next batch. I haven't checked if this is a bug in serf_get
or in Serf proper; the connection has more requests ready, so it should
send new ones as soon any received response makes room in the pending
requests queue. I saw this with the following invocation:

     serf_get -d -n 500 -c 1 -x 50 -U user -P something -m GEThttp://
127.0.0.1:8087/ >/dev/null



The instrumentation patch for serf_get:

--- serf-trunk/test/serf_get.c  (revision 1930478)
+++ serf-trunk/test/serf_get.c  (working copy)
@@ -590,6 +590,7 @@ int main(int argc, const char **argv)
       apr_getopt_t *opt;
       int opt_c;
       const char *opt_arg;
+    unsigned int prev_min = 0, prev_max = UINT_MAX, prev_all = 0;

       apr_initialize();
       atexit(apr_terminate);
@@ -910,6 +911,24 @@ int main(int argc, const char **argv)
               break;
           }
           /* Debugging purposes only! */
+        if (debug) {
+            unsigned int min = UINT_MAX, max = 0, all = 0;
+            for (i = 0; i < conn_count; i++) {
+                serf_connection_t *conn = connections[i];
+                unsigned int pending =
(serf_connection_pending_requests(conn)
+                                        -
serf_connection_queued_requests(conn));
+                all += pending;
+                if (min > pending) min = pending;
+                if (max < pending) max = pending;
+            }
+            if (prev_min != min || prev_max != max || prev_all != all) {
+                fprintf(stderr, ">pending: %u [avg=%0.1f min=%u
max=%u]\n",
+                        all, (double)all/conn_count, min, max);
+                prev_min = min;
+                prev_max = max;
+                prev_all = all;
+            }
+        }
           serf_debug__closed_conn(app_ctx.bkt_alloc);
       }


-- Brane

With a longer delay in authserver.py (10.0) and a high enough number of
concurrent connections (15 seems enough) and high enough outstanding
requests (>2), I seem to get an error:
[[[
dsg@devi-25-01:~/serf_trunk/test$ ./serf_get -d -n 150 -c 15 -x 5 -U foo -P
barhttp://127.0.0.1:8087
...
Error running context: (70014) End of file found
]]]

With 2 outstanding requests:
[[[
dsg@devi-25-01:~/serf_trunk/test$ ./serf_get -d -n 150 -c 15 -x 2 -U foo -P
barhttp://127.0.0.1:8087
...
2025-12-13T19:06:06.161609+00 ERROR [l:127.0.0.1:35170 r:127.0.0.1:8087]
receiving raw: Error 104 while reading.
Error running context: (104) Connection reset by peer
]]]

If I add the usleep(), I can't reproduce these errors anymore. It seems to
be exactly opposite of what the original poster says - as far as I
understand they claim that the error is triggered by adding a usleep?

SERF-195 is about memory corruption and crashes on the client. Thisappears to be something else.

In any case, it would make sense to retry your test with serf_get fromthe SERF-195 branch (but note that branch needs to be sync-merged fromtrunk).


-- Brane

Re: Reproducing SERF-195 [was: Releasing Serf 1.5.0]

Reply via email to