Re: Reproducing SERF-195 [was: Releasing Serf 1.5.0]

Branko Čibej Mon, 15 Dec 2025 14:45:08 -0800

On 15. 12. 25 19:30, Daniel Sahlberg wrote:

Den lör 13 dec. 2025 kl 20:43 skrev Branko Čibej<[email protected]>:

On 13. 12. 25 20:14, Daniel Sahlberg wrote:

Den lör 13 dec. 2025 kl 16:39 skrev Branko Čibej<[email protected]>:

On 13. 12. 25 03:35, Branko Čibej wrote:

On 12. 12. 25 20:49, Branko Čibej wrote:

On 12. 12. 25 20:02, Daniel Sahlberg wrote:

Den tors 11 dec. 2025 kl 19:43 skrev Branko Čibej<[email protected]>:

On 11. 12. 25 19:59, Daniel Sahlberg wrote:
* Issue SERF-195, which is a substantial rewrite of the request
queues.
The

code is in a separate branch SERF-195. The code looks good to me
but I
haven't analyzed in detail. It should be possible to merge to

trunk.

The patch works, in the sense that it causes no regressions, but I
haven't been able to reproduce the reported issue. It's also makes

the

code slightly more readable, but I couldn't find the mentioned race
condition during review, it looks like just a refactoring of the
original code.

I think we need another pair of eyes to do a careful review of the
branch changes.

I'll try to reproduce the initial issue and see if I can figure out
what
goes wrong. Do you have any ideas where I should start? I assume a
multithreaded client (with multiple connections) would be needed?

Multi-threading on the client has no effect here. Request queues are
per-context and contexts require single-threaded access.

The original issue submission just adds a sleep() in
serf__setup_request_basic_auth(), so I guess you don't need a
multi-threaded client, you just need a server that always requires
authentication. That's why, e.g., linking Subversion with that and
doing read from the Apache repos won't work. You could try to set up
a local multi-threaded http server that requires basic auth and
serves some data  -- easy to do in Python -- then write a Serf-based
client that sends requests in a loop. Serf-get would be a good
starting point.

I think the important points here are:

   * all requests to the server must require authentication;
   * the server must be able to process requests in parallel and out of
     order: this is the only way I can think of where processed

requests

     could be removed from the middle of the queue, not the beginning

or

     the end;
   * the server must be "slow" enough that the client request queue

fills

     up, so it needs both an upper limit to the number of parallel
     requests and a fake latency for the responses.


I'll give the server script a go.

See r1930478.

Try something like this:

     $ authserver.py 2>/dev/null &
     $ serf_get -n 400 -c 10 -x 20 -U user -P pass -m
HEADhttp://127.0.0.1:8087/


However, this still doesn't reproduce the reported issue. One possible
reason is that serf_get does *not* pipeline the requests in each
connection but only sends the next request when the previous response
has arrived. I haven't been able to track this down, don't know if
it's a problem with serf_get or authserver.py.

I was mistaken about the pipelining. After adding some instrumentation
to serf_get (see below), it turns out that requests are pipelined, and
yet I can't reproduce the issue.

This is the patch for the basic auth handler, the same as in the
description of SERF-195:

--- serf-trunk/auth/auth_basic.c        (revision 1930477)
+++ serf-trunk/auth/auth_basic.c        (working copy)
@@ -19,6 +19,7 @@
     */

    /*** Basic authentication ***/
+#include <unistd.h>

    #include <serf.h>
    #include <serf_private.h>
@@ -153,6 +154,7 @@ serf__setup_request_basic_auth(const

serf__authn_schem

        basic_info = authn_info->baton;

        if (basic_info && basic_info->header && basic_info->value) {
+        usleep(75000);  /* 75 milliseconds */
            serf_bucket_headers_setn(hdrs_bkt, basic_info->header,
                                     basic_info->value);
            return APR_SUCCESS;


I also noticed something interesting: Each connection sends up to
max-pending requests at once, but then waits for all the replies before
sending the next batch. I haven't checked if this is a bug in serf_get
or in Serf proper; the connection has more requests ready, so it should
send new ones as soon any received response makes room in the pending
requests queue. I saw this with the following invocation:

      serf_get -d -n 500 -c 1 -x 50 -U user -P something -m GEThttp://
127.0.0.1:8087/ >/dev/null



The instrumentation patch for serf_get:

--- serf-trunk/test/serf_get.c  (revision 1930478)
+++ serf-trunk/test/serf_get.c  (working copy)
@@ -590,6 +590,7 @@ int main(int argc, const char **argv)
        apr_getopt_t *opt;
        int opt_c;
        const char *opt_arg;
+    unsigned int prev_min = 0, prev_max = UINT_MAX, prev_all = 0;

        apr_initialize();
        atexit(apr_terminate);
@@ -910,6 +911,24 @@ int main(int argc, const char **argv)
                break;
            }
            /* Debugging purposes only! */
+        if (debug) {
+            unsigned int min = UINT_MAX, max = 0, all = 0;
+            for (i = 0; i < conn_count; i++) {
+                serf_connection_t *conn = connections[i];
+                unsigned int pending =
(serf_connection_pending_requests(conn)
+                                        -
serf_connection_queued_requests(conn));
+                all += pending;
+                if (min > pending) min = pending;
+                if (max < pending) max = pending;
+            }
+            if (prev_min != min || prev_max != max || prev_all != all)

+                fprintf(stderr, ">pending: %u [avg=%0.1f min=%u
max=%u]\n",
+                        all, (double)all/conn_count, min, max);
+                prev_min = min;
+                prev_max = max;
+                prev_all = all;
+            }
+        }
            serf_debug__closed_conn(app_ctx.bkt_alloc);
        }


-- Brane

With a longer delay in authserver.py (10.0) and a high enough number of
concurrent connections (15 seems enough) and high enough outstanding
requests (>2), I seem to get an error:
[[[
dsg@devi-25-01:~/serf_trunk/test$ ./serf_get -d -n 150 -c 15 -x 5 -U

foo -P

barhttp://127.0.0.1:8087
...
Error running context: (70014) End of file found
]]]

With 2 outstanding requests:
[[[
dsg@devi-25-01:~/serf_trunk/test$ ./serf_get -d -n 150 -c 15 -x 2 -U

foo -P

barhttp://127.0.0.1:8087
...
2025-12-13T19:06:06.161609+00 ERROR [l:127.0.0.1:35170 r:127.0.0.1:8087]
receiving raw: Error 104 while reading.
Error running context: (104) Connection reset by peer
]]]

If I add the usleep(), I can't reproduce these errors anymore. It seems

to

be exactly opposite of what the original poster says - as far as I
understand they claim that the error is triggered by adding a usleep?


SERF-195 is about memory corruption and crashes on the client. This
appears to be something else.

In any case, it would make sense to retry your test with serf_get from
the SERF-195 branch (but note that branch needs to be sync-merged from
trunk).

Tried it and it crashes the same way - which as Brane correctly notes -
isn't quite the same as described in this issue.

I've been thinking about the usleep() and it doesn't make sense that it
should introduce any problems for a single threaded application. From how I
understand the description, it seems like something is removing stuff from
the linked lists during sleep. But if there is only one thread, that thread
should be the only one working the lists so nothing should happen.

Could it be that the original poster is reusing the same context across
multiple threads?



I've been thinking along those lines, yes.

  If so, this is not a supported use case and we should close this issue?

The changes on that branch aren't bad, it's nice to encapsulatelinked-list management. But if we can't reproduce the original issue, Iwouldn't release this in 1.5, because it's not tested all that much. Iwould merge to trunk _after_ we cut the 1.5 branch, though.

If we decide to do it this way, then all that's left are the &*#@^Y*!error callbacks.



-- Brane

Re: Reproducing SERF-195 [was: Releasing Serf 1.5.0]

Reply via email to