Hi,
On 2026/05/29 4:37, Branko Čibej wrote:
> This intermittent failure shows up sometimes on both Linux and macOS:
>
> FAIL: ra-test: Unknown test failure (-11); see tests.log.
>
> That's a crash, 11 is SIGSEGV on both OSes. Sometimes the same happens in
> repos-test, too, and it doesn't matter if its local or DAV or svnserve. I
> suspect that we have a wonderful little bug in our code. I've been trying, on
> and off, to trace this down either with a debugger or with clang's address
> sanitiser or with APR's pool debugging enabled, but it's an elusive little
> bxtrd and I've not yet been able to track it down. Not even so far as to
> figure out whether it's hiding in FSFS or somewhere else.
>
> It's been a few years and I think I remember seeing it on the 1.14 branch,
> too. On the other hand, I don't recall any bug reports that could be linked
> to this observation. We're seeing this failure in the CI tests, too, at least
> with autotools. I haven't seen this with CMake, but failures in those
> workflows are far more often related to vcpkg or other environmental stuff,
> so it's a bit hard to find.
>
> If anyone has any idea where to look without diving into a line-by-line
> review of the code, please share your thoughts. This is starting to get on my
> nerves, just a little bit.
>
The issue occurs in serf_default_destroy_and_data() via
apr_pool_cleanup_for_exec() from the child process after apr_proc_create() to
open a tunnel with multi-threaded. I assume that libsvn_ra_serf and/or serf
have something wrong. Also, I guess that the same issue might occur when a hook
is invoked with SVNMasterURI enabled on Apache mpm event or worker.
[[[
$ sudo sysctl -w 'kernel.core_pattern=/var/crash/%t.%e.%p.%h'
$ ulimit -c unlimited
$ make davautocheck PARALLEL=8 APACHE_MPM=event TESTS="$(python3 -c 'print("
".join(["subversion/tests/libsvn_ra/ra-test"]*64))')"
...
$ gdb subversion/tests/libsvn_ra/.libs/ra-test
/var/crash/1780005501.ra-test.1140874.localhost
...
Reading symbols from subversion/tests/libsvn_ra/.libs/ra-test...
[New LWP 1140874]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by
`/home/jun66j5/src/subversion/subversion.git/subversion/tests/libsvn_ra/.libs/ra'.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140687252571712)
at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140687252571712)
at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140687252571712) at
./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140687252571712, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3 0x00007ff44f0b5476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4 0x00007ff44f09b7f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007ff44eb5c526 in serf_bucket_mem_free (allocator=<optimized out>,
block=<optimized out>) at buckets/allocator.c:267
#6 0x00007ff44eb5fc25 in serf_default_destroy_and_data (bucket=0x7ff44c0a00b8)
at buckets/buckets.c:125
#7 0x00007ff44eb5fd5b in serf_chunk_destroy (bucket=0x7ff44c09e7b8) at
buckets/chunk_buckets.c:225
#8 0x00007ff44eb5fca7 in cleanup_aggregate (allocator=0x7ff44c0aa0a0,
ctx=0x7ff44c09e438) at buckets/aggregate_buckets.c:56
#9 serf_aggregate_destroy_and_data (bucket=0x7ff44c0a1db8) at
buckets/aggregate_buckets.c:127
#10 0x00007ff44eb5c64e in clean_resp (data=0x7ff44c0b0038) at
/build/serf-6d3kPd/serf-1.3.9/outgoing.c:62
#11 0x00007ff44f2b8876 in () at /usr/lib/x86_64-linux-gnu/libapr-1.so.0
#12 0x00007ff44f2bcc80 in apr_pool_cleanup_for_exec () at
/usr/lib/x86_64-linux-gnu/libapr-1.so.0
#13 0x00007ff44f2c97ca in apr_proc_create () at
/usr/lib/x86_64-linux-gnu/libapr-1.so.0
#14 0x000055d20b7f033b in open_tunnel
(request=0x7ff44c0e3270, response=0x7ff44c0e3278,
close_func=0x7ff44c0e3268, close_baton=0x7ff44c0e3258,
tunnel_baton=0x7ff44cb530a0, tunnel_name=<optimized out>, user=0x0,
hostname=0x7ff44c0e30c0 "localhost", port=0, cancel_func=0x0, cancel_baton=0x0,
pool=0x7ff44c0e3028) at subversion/tests/libsvn_ra/ra-test.c:283
#15 0x00007ff44ef229c2 in open_session
(sess_p=sess_p@entry=0x7ff44db6bb38, url=url@entry=0x7ff44cb530b0
"svn+test://localhost/test-run_checkout", uri=uri@entry=0x7ff44db6bb40,
tunnel_name=tunnel_name@entry=0x7ff44cad4280 "test",
tunnel_argv=tunnel_argv@entry=0x0, config=config@entry=0x0,
callbacks=0x7ff44cb530d8, callbacks_baton=0x0, auth_baton=0x7ff44cad40e8,
result_pool=0x7ff44c0e3028, scratch_pool=0x7ff44cb43028)
at subversion/libsvn_ra_svn/client.c:674
#16 0x00007ff44ef23b7a in ra_svn_open
(session=0x7ff44cad4258, corrected_url=<optimized out>,
redirect_url=<optimized out>, url=0x7ff44cb530b0 "svn+test://localhost/t
est-run_checkout", callbacks=0x7ff44cb530d8, callback_baton=0x0,
auth_baton=0x7ff44cad40e8, config=0x0, result_pool=0x7ff44cad4028,
scratch_pool=0x7ff44cb43028) at subversion/libsvn_ra_svn/client.c:904
#17 0x00007ff44f39c4a8 in svn_ra_open5
(session_p=session_p@entry=0x7ff44db6bd38,
corrected_url_p=corrected_url_p@entry=0x0,
redirect_url_p=redirect_url_p@entry=0x0, r
epos_URL=repos_URL@entry=0x7ff44cb530b0
"svn+test://localhost/test-run_checkout", uuid=uuid@entry=0x0,
callbacks=0x7ff44cb530d8, callback_baton=0x0, config=0x0, pool=0x7ff44c593028)
at subversion/libsvn_ra/ra_loader.c:388
#18 0x000055d20b7f0000 in tunnel_run_checkout (opts=<optimized out>,
pool=0x7ff44cb53028)
at subversion/tests/libsvn_ra/ra-test.c:1630
#19 0x00007ff44f3abb6b in test_thread (thread=0x7ff44e3720f0,
data=0x7ffc18fa7bb0) at subversion/tests/svn_test_main.c:577
#20 0x00007ff44f107ac3 in start_thread (arg=<optimized out>) at
./nptl/pthread_create.c:442
#21 0x00007ff44f1998d0 in clone3 () at
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
]]]
I'd suggest that we should enable core dumps and print backtrace of the core
dumps after the unit tests. See attached patch.
Examples:
*
https://github.com/jun66j5/subversion/actions/runs/26602723573/job/78390180104#step:16:22
*
https://github.com/jun66j5/subversion/actions/runs/26602723573/job/78390180199#step:17:22
--
Jun Omae <[email protected]> (大前 潤)diff --git a/.github/workflows/autoconf.yml b/.github/workflows/autoconf.yml
index 21b56c80d6691..565dc70ee0f3b 100644
--- a/.github/workflows/autoconf.yml
+++ b/.github/workflows/autoconf.yml
@@ -74,6 +74,7 @@ jobs:
libsecret-1-dev
python3-lxml
python3-rnc2rng
+ gdb
- name: Install dependencies (macOS, homebrew)
if: runner.os == 'macOS'
@@ -165,12 +166,53 @@ jobs:
- name: Build (make)
run: make -j4
+ - name: Enable core dumps (Linux)
+ if: runner.os == 'Linux'
+ run: |
+ test -d /cores || sudo mkdir -m 0777 /cores
+ sudo sysctl -w 'kernel.core_pattern=/cores/%t.%e.%p.%h'
+
+ - name: Delete old diagnostic reports (macOS)
+ if: runner.os == 'macOS'
+ run: |
+ rm -f ~/Library/Logs/DiagnosticReports/*.ips
+
- name: Run tests
run: |
+ ulimit -c unlimited
make ${{matrix.check-target}} PARALLEL=6 \
APACHE_MPM="${{ steps.platform.outputs.apache-mpm }}" \
PYTHON_VENV="$HOME/test-python-venv"
+ - name: Print backtrace of core dumps (Linux)
+ if: always() && runner.os == 'Linux'
+ run: |
+ for core in /cores/*; do
+ test -r "$core" || continue
+ # Workaround for gdb losing the executable file name
+ execfn="$(gdb -q -c "$core" -ex 'info auxv' -ex quit | \
+ sed -n '/AT_EXECFN/ { s/.*"\([^"]*\)".*/\1/; p; q }')"
+ test -n "$execfn" -a -e "$execfn" || execfn=-c
+ echo "::group::$core"
+ gdb -q "$execfn" "$core" \
+ -ex 'set pagination off' \
+ -ex 'info proc mappings' \
+ -ex 'thread apply all bt full' \
+ -ex 'quit'
+ echo '::endgroup::'
+ done
+
+ - name: Print diagnostic reports (macOS)
+ if: always() && runner.os == 'macOS'
+ shell: bash
+ run: |
+ for ips in ~/Library/Logs/DiagnosticReports/*.ips; do
+ test -r "$ips" || continue
+ echo "::group::$ips"
+ viewdiagnostic "$ips" || :
+ echo '::endgroup::'
+ done
+
- name: Archive test logs
if: always()
uses: actions/upload-artifact@v7
diff --git a/.github/workflows/cmake.yml b/.github/workflows/cmake.yml
index 8bb08c5af2f88..d93edc6ac677e 100644
--- a/.github/workflows/cmake.yml
+++ b/.github/workflows/cmake.yml
@@ -211,22 +211,74 @@ jobs:
- name: Install
run: cmake --install out --config Release
+ - name: Enable core dumps (Linux)
+ if: runner.os == 'Linux'
+ shell: bash
+ run: |
+ test -d /cores || sudo mkdir -m 0777 /cores
+ sudo sysctl -w 'kernel.core_pattern=/cores/%t.%e.%p.%h'
+
+ - name: Delete old diagnostic reports (macOS)
+ if: runner.os == 'macOS'
+ shell: bash
+ run: |
+ rm -f ~/Library/Logs/DiagnosticReports/*.ips
+
- name: Run all tests
id: run_all_tests
if: matrix.run_tests
working-directory: out
- run: ctest --output-on-failure --verbose -C Release --parallel 16
+ shell: bash
+ run: |
+ ulimit -c unlimited
+ ctest --output-on-failure --verbose -C Release --parallel 16
- name: Test shelf2
if: matrix.run_tests
working-directory: out
env:
SVN_EXPERIMENTAL_COMMANDS: shelf2
- run: ctest -R shelf2 --verbose -C Release
+ shell: bash
+ run: |
+ ulimit -c unlimited
+ ctest -R shelf2 --verbose -C Release
- name: Test shelf3
if: matrix.run_tests
working-directory: out
env:
SVN_EXPERIMENTAL_COMMANDS: shelf3
- run: ctest -R shelf3 --verbose -C Release
+ shell: bash
+ run: |
+ ulimit -c unlimited
+ ctest -R shelf3 --verbose -C Release
+
+ - name: Print backtrace of core dumps (Linux)
+ if: always() && runner.os == 'Linux'
+ shell: bash
+ run: |
+ for core in /cores/*; do
+ test -r "$core" || continue
+ # Workaround for gdb losing the executable file name
+ execfn="$(gdb -q -c "$core" -ex 'info auxv' -ex quit | \
+ sed -n '/AT_EXECFN/ { s/.*"\([^"]*\)".*/\1/; p; q }')"
+ test -n "$execfn" -a -e "$execfn" || execfn=-c
+ echo "::group::$core"
+ gdb -q "$execfn" "$core" \
+ -ex 'set pagination off' \
+ -ex 'info proc mappings' \
+ -ex 'thread apply all bt full' \
+ -ex 'quit'
+ echo '::endgroup::'
+ done
+
+ - name: Print diagnostic reports (macOS)
+ if: always() && runner.os == 'macOS'
+ shell: bash
+ run: |
+ for ips in ~/Library/Logs/DiagnosticReports/*.ips; do
+ test -r "$ips" || continue
+ echo "::group::$ips"
+ viewdiagnostic "$ips" || :
+ echo '::endgroup::'
+ done