Hi,

On 2026/05/29 4:37, Branko Čibej wrote:
> This intermittent failure shows up sometimes on both Linux and macOS:
> 
> FAIL:  ra-test: Unknown test failure (-11); see tests.log.
> 
> That's a crash, 11 is SIGSEGV on both OSes. Sometimes the same happens in 
> repos-test, too, and it doesn't matter if its local or DAV or svnserve. I 
> suspect that we have a wonderful little bug in our code. I've been trying, on 
> and off, to trace this down either with a debugger or with clang's address 
> sanitiser or with APR's pool debugging enabled, but it's an elusive little 
> bxtrd and I've not yet been able to track it down. Not even so far as to 
> figure out whether it's hiding in FSFS or somewhere else.
> 
> It's been a few years and I think I remember seeing it on the 1.14 branch, 
> too. On the other hand, I don't recall any bug reports that could be linked 
> to this observation. We're seeing this failure in the CI tests, too, at least 
> with autotools. I haven't seen this with CMake, but failures in those 
> workflows are far more often related to vcpkg or other environmental stuff, 
> so it's a bit hard to find.
> 
> If anyone has any idea where to look without diving into a line-by-line 
> review of the code, please share your thoughts. This is starting to get on my 
> nerves, just a little bit.
> 

The issue occurs in serf_default_destroy_and_data() via 
apr_pool_cleanup_for_exec() from the child process after apr_proc_create() to 
open a tunnel with multi-threaded. I assume that libsvn_ra_serf and/or serf 
have something wrong. Also, I guess that the same issue might occur when a hook 
is invoked with SVNMasterURI enabled on Apache mpm event or worker.

[[[
$ sudo sysctl -w 'kernel.core_pattern=/var/crash/%t.%e.%p.%h'
$ ulimit -c unlimited
$ make davautocheck PARALLEL=8 APACHE_MPM=event TESTS="$(python3 -c 'print(" 
".join(["subversion/tests/libsvn_ra/ra-test"]*64))')"
...
$ gdb subversion/tests/libsvn_ra/.libs/ra-test 
/var/crash/1780005501.ra-test.1140874.localhost
...
Reading symbols from subversion/tests/libsvn_ra/.libs/ra-test...
[New LWP 1140874]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by 
`/home/jun66j5/src/subversion/subversion.git/subversion/tests/libsvn_ra/.libs/ra'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140687252571712) 
at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140687252571712) 
at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140687252571712) at 
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140687252571712, signo=signo@entry=6) at 
./nptl/pthread_kill.c:89
#3  0x00007ff44f0b5476 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
#4  0x00007ff44f09b7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ff44eb5c526 in serf_bucket_mem_free (allocator=<optimized out>, 
block=<optimized out>) at buckets/allocator.c:267
#6  0x00007ff44eb5fc25 in serf_default_destroy_and_data (bucket=0x7ff44c0a00b8) 
at buckets/buckets.c:125
#7  0x00007ff44eb5fd5b in serf_chunk_destroy (bucket=0x7ff44c09e7b8) at 
buckets/chunk_buckets.c:225
#8  0x00007ff44eb5fca7 in cleanup_aggregate (allocator=0x7ff44c0aa0a0, 
ctx=0x7ff44c09e438) at buckets/aggregate_buckets.c:56
#9  serf_aggregate_destroy_and_data (bucket=0x7ff44c0a1db8) at 
buckets/aggregate_buckets.c:127
#10 0x00007ff44eb5c64e in clean_resp (data=0x7ff44c0b0038) at 
/build/serf-6d3kPd/serf-1.3.9/outgoing.c:62
#11 0x00007ff44f2b8876 in  () at /usr/lib/x86_64-linux-gnu/libapr-1.so.0
#12 0x00007ff44f2bcc80 in apr_pool_cleanup_for_exec () at 
/usr/lib/x86_64-linux-gnu/libapr-1.so.0
#13 0x00007ff44f2c97ca in apr_proc_create () at 
/usr/lib/x86_64-linux-gnu/libapr-1.so.0
#14 0x000055d20b7f033b in open_tunnel
    (request=0x7ff44c0e3270, response=0x7ff44c0e3278, 
close_func=0x7ff44c0e3268, close_baton=0x7ff44c0e3258, 
tunnel_baton=0x7ff44cb530a0, tunnel_name=<optimized out>, user=0x0, 
hostname=0x7ff44c0e30c0 "localhost", port=0, cancel_func=0x0, cancel_baton=0x0, 
pool=0x7ff44c0e3028) at subversion/tests/libsvn_ra/ra-test.c:283
#15 0x00007ff44ef229c2 in open_session
    (sess_p=sess_p@entry=0x7ff44db6bb38, url=url@entry=0x7ff44cb530b0 
"svn+test://localhost/test-run_checkout", uri=uri@entry=0x7ff44db6bb40, 
tunnel_name=tunnel_name@entry=0x7ff44cad4280 "test", 
tunnel_argv=tunnel_argv@entry=0x0, config=config@entry=0x0, 
callbacks=0x7ff44cb530d8, callbacks_baton=0x0, auth_baton=0x7ff44cad40e8, 
result_pool=0x7ff44c0e3028, scratch_pool=0x7ff44cb43028)
    at subversion/libsvn_ra_svn/client.c:674
#16 0x00007ff44ef23b7a in ra_svn_open
    (session=0x7ff44cad4258, corrected_url=<optimized out>, 
redirect_url=<optimized out>, url=0x7ff44cb530b0 "svn+test://localhost/t
    est-run_checkout", callbacks=0x7ff44cb530d8, callback_baton=0x0, 
auth_baton=0x7ff44cad40e8, config=0x0, result_pool=0x7ff44cad4028, 
scratch_pool=0x7ff44cb43028) at subversion/libsvn_ra_svn/client.c:904
#17 0x00007ff44f39c4a8 in svn_ra_open5
    (session_p=session_p@entry=0x7ff44db6bd38, 
corrected_url_p=corrected_url_p@entry=0x0, 
redirect_url_p=redirect_url_p@entry=0x0, r
    epos_URL=repos_URL@entry=0x7ff44cb530b0 
"svn+test://localhost/test-run_checkout", uuid=uuid@entry=0x0, 
callbacks=0x7ff44cb530d8, callback_baton=0x0, config=0x0, pool=0x7ff44c593028) 
at subversion/libsvn_ra/ra_loader.c:388
#18 0x000055d20b7f0000 in tunnel_run_checkout (opts=<optimized out>, 
pool=0x7ff44cb53028)
    at subversion/tests/libsvn_ra/ra-test.c:1630
#19 0x00007ff44f3abb6b in test_thread (thread=0x7ff44e3720f0, 
data=0x7ffc18fa7bb0) at subversion/tests/svn_test_main.c:577
#20 0x00007ff44f107ac3 in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:442
#21 0x00007ff44f1998d0 in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
]]]

I'd suggest that we should enable core dumps and print backtrace of the core 
dumps after the unit tests. See attached patch.

Examples:
* 
https://github.com/jun66j5/subversion/actions/runs/26602723573/job/78390180104#step:16:22
* 
https://github.com/jun66j5/subversion/actions/runs/26602723573/job/78390180199#step:17:22

-- 
Jun Omae <[email protected]> (大前 潤)
diff --git a/.github/workflows/autoconf.yml b/.github/workflows/autoconf.yml
index 21b56c80d6691..565dc70ee0f3b 100644
--- a/.github/workflows/autoconf.yml
+++ b/.github/workflows/autoconf.yml
@@ -74,6 +74,7 @@ jobs:
           libsecret-1-dev
           python3-lxml
           python3-rnc2rng
+          gdb
 
       - name: Install dependencies (macOS, homebrew)
         if: runner.os == 'macOS'
@@ -165,12 +166,53 @@ jobs:
       - name: Build (make)
         run: make -j4
 
+      - name: Enable core dumps (Linux)
+        if: runner.os == 'Linux'
+        run: |
+          test -d /cores || sudo mkdir -m 0777 /cores
+          sudo sysctl -w 'kernel.core_pattern=/cores/%t.%e.%p.%h'
+
+      - name: Delete old diagnostic reports (macOS)
+        if: runner.os == 'macOS'
+        run: |
+          rm -f ~/Library/Logs/DiagnosticReports/*.ips
+
       - name: Run tests
         run: |
+          ulimit -c unlimited
           make ${{matrix.check-target}} PARALLEL=6 \
               APACHE_MPM="${{ steps.platform.outputs.apache-mpm }}" \
               PYTHON_VENV="$HOME/test-python-venv"
 
+      - name: Print backtrace of core dumps (Linux)
+        if: always() && runner.os == 'Linux'
+        run: |
+          for core in /cores/*; do
+            test -r "$core" || continue
+            # Workaround for gdb losing the executable file name
+            execfn="$(gdb -q -c "$core" -ex 'info auxv' -ex quit | \
+                      sed -n '/AT_EXECFN/ { s/.*"\([^"]*\)".*/\1/; p; q }')"
+            test -n "$execfn" -a -e "$execfn" || execfn=-c
+            echo "::group::$core"
+            gdb -q "$execfn" "$core" \
+                -ex 'set pagination off' \
+                -ex 'info proc mappings' \
+                -ex 'thread apply all bt full' \
+                -ex 'quit'
+            echo '::endgroup::'
+          done
+
+      - name: Print diagnostic reports (macOS)
+        if: always() && runner.os == 'macOS'
+        shell: bash
+        run: |
+          for ips in ~/Library/Logs/DiagnosticReports/*.ips; do
+            test -r "$ips" || continue
+            echo "::group::$ips"
+            viewdiagnostic "$ips" || :
+            echo '::endgroup::'
+          done
+
       - name: Archive test logs
         if: always()
         uses: actions/upload-artifact@v7
diff --git a/.github/workflows/cmake.yml b/.github/workflows/cmake.yml
index 8bb08c5af2f88..d93edc6ac677e 100644
--- a/.github/workflows/cmake.yml
+++ b/.github/workflows/cmake.yml
@@ -211,22 +211,74 @@ jobs:
       - name: Install
         run: cmake --install out --config Release
 
+      - name: Enable core dumps (Linux)
+        if: runner.os == 'Linux'
+        shell: bash
+        run: |
+          test -d /cores || sudo mkdir -m 0777 /cores
+          sudo sysctl -w 'kernel.core_pattern=/cores/%t.%e.%p.%h'
+
+      - name: Delete old diagnostic reports (macOS)
+        if: runner.os == 'macOS'
+        shell: bash
+        run: |
+          rm -f ~/Library/Logs/DiagnosticReports/*.ips
+
       - name: Run all tests
         id: run_all_tests
         if: matrix.run_tests
         working-directory: out
-        run: ctest --output-on-failure --verbose -C Release --parallel 16
+        shell: bash
+        run: |
+          ulimit -c unlimited
+          ctest --output-on-failure --verbose -C Release --parallel 16
 
       - name: Test shelf2
         if: matrix.run_tests
         working-directory: out
         env:
           SVN_EXPERIMENTAL_COMMANDS: shelf2
-        run: ctest -R shelf2 --verbose -C Release
+        shell: bash
+        run: |
+          ulimit -c unlimited
+          ctest -R shelf2 --verbose -C Release
 
       - name: Test shelf3
         if: matrix.run_tests
         working-directory: out
         env:
           SVN_EXPERIMENTAL_COMMANDS: shelf3
-        run: ctest -R shelf3 --verbose -C Release
+        shell: bash
+        run: |
+          ulimit -c unlimited
+          ctest -R shelf3 --verbose -C Release
+
+      - name: Print backtrace of core dumps (Linux)
+        if: always() && runner.os == 'Linux'
+        shell: bash
+        run: |
+          for core in /cores/*; do
+            test -r "$core" || continue
+            # Workaround for gdb losing the executable file name
+            execfn="$(gdb -q -c "$core" -ex 'info auxv' -ex quit | \
+                      sed -n '/AT_EXECFN/ { s/.*"\([^"]*\)".*/\1/; p; q }')"
+            test -n "$execfn" -a -e "$execfn" || execfn=-c
+            echo "::group::$core"
+            gdb -q "$execfn" "$core" \
+                -ex 'set pagination off' \
+                -ex 'info proc mappings' \
+                -ex 'thread apply all bt full' \
+                -ex 'quit'
+            echo '::endgroup::'
+          done
+
+      - name: Print diagnostic reports (macOS)
+        if: always() && runner.os == 'macOS'
+        shell: bash
+        run: |
+          for ips in ~/Library/Logs/DiagnosticReports/*.ips; do
+            test -r "$ips" || continue
+            echo "::group::$ips"
+            viewdiagnostic "$ips" || :
+            echo '::endgroup::'
+          done

Reply via email to