On 29. 5. 2026 12:16, Branko Čibej wrote:
On 29. 5. 2026 08:50, Branko Čibej wrote:
On 29. 5. 2026 00:17, Jun Omae wrote:
Hi,

On 2026/05/29 4:37, Branko Čibej wrote:
This intermittent failure shows up sometimes on both Linux and macOS:

FAIL:  ra-test: Unknown test failure (-11); see tests.log.

That's a crash, 11 is SIGSEGV on both OSes. Sometimes the same happens in 
repos-test, too, and it doesn't matter if its local or DAV or svnserve. I 
suspect that we have a wonderful little bug in our code. I've been trying, on 
and off, to trace this down either with a debugger or with clang's address 
sanitiser or with APR's pool debugging enabled, but it's an elusive little 
bxtrd and I've not yet been able to track it down. Not even so far as to figure 
out whether it's hiding in FSFS or somewhere else.

It's been a few years and I think I remember seeing it on the 1.14 branch, too. 
On the other hand, I don't recall any bug reports that could be linked to this 
observation. We're seeing this failure in the CI tests, too, at least with 
autotools. I haven't seen this with CMake, but failures in those workflows are 
far more often related to vcpkg or other environmental stuff, so it's a bit 
hard to find.

If anyone has any idea where to look without diving into a line-by-line review 
of the code, please share your thoughts. This is starting to get on my nerves, 
just a little bit.

The issue occurs in serf_default_destroy_and_data() via 
apr_pool_cleanup_for_exec() from the child process after apr_proc_create() to 
open a tunnel with multi-threaded. I assume that libsvn_ra_serf and/or serf 
have something wrong. Also, I guess that the same issue might occur when a hook 
is invoked with SVNMasterURI enabled on Apache mpm event or worker.

It happens without DAV, too, and with prefork mpm – the only one that even works on macOS, so I always use that for testing.


This is what I caught on macOS. It was davautocheck, but Serf is not involved:

(lldb) target create "/Volumes/svn-test/trunk/subversion/tests/libsvn_ra/.libs/ra-test" 
--core "/cores/core.9605"
Core file '/cores/core.9605' (arm64) was loaded.
(lldb) bt
* thread #2, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x0)
   * frame #0: 0x0000000104a83524 
ra-test`close_tunnel(tunnel_context=0x000000072d08e6b0, 
tunnel_baton=0x000000072d03c0a0) at ra-test.c:329:7
     frame #1: 0x0000000104d6b5f0 
libsvn_ra_svn-1.0.dylib`close_tunnel_cleanup(baton=0x000000072d08e258) at 
client.c:597:5
     frame #2: 0x00000001051fd1b4 libapr-1.0.dylib`apr_pool_destroy + 88
     frame #3: 0x00000001051fd1a0 libapr-1.0.dylib`apr_pool_destroy + 68
     frame #4: 0x0000000104b353a0 
libsvn_ra-1.0.dylib`svn_ra_open5(session_p=0x000000016b5b2ec0, 
corrected_url_p=0x0000000000000000, redirect_url_p=0x0000000000000000, 
repos_URL="svn+test://localhost/test-repo-tunnel", uuid=0x0000000000000000, 
callbacks=0x000000072d03c0d8, callback_baton=0x0000000000000000, 
config=0x0000000000000000, pool=0x000000072d040028) at ra_loader.c:395:7
     frame #5: 0x0000000104a7992c 
ra-test`tunnel_callback_test(opts=0x000000016b3826b8, pool=0x000000072d03c028) 
at ra-test.c:469:3
     frame #6: 0x0000000104aba8a0 
libsvn_test-1.0.dylib`test_thread(thread=0x000000072d01c150, 
data=0x000000016b382570) at svn_test_main.c:577:15
     frame #7: 0x0000000187549c58 libsystem_pthread.dylib`_pthread_start + 136


Note that this test runs against svnserve with just 'check' too, no DAV involved at all. So it's either caused by something in libsvn_ra_svn, or more likely, in ra-test itself. Those tunnels are notoriously tricky, I remember we had trouble with them before in JavaHL, too.

Finally got some interesting info from the core file – it takes some work to get core dumps on the Mac, setting ulimit -c isn't enough.

frame #0: 0x0000000104a83524 
ra-test`close_tunnel(tunnel_context=0x000000072d08e6b0, 
tunnel_baton=0x000000072d03c0a0) at ra-test.c:329:7
   326          apr_proc_wait(b->proc, &child_exit_code, &child_exit_why, 
APR_WAIT);
   327  
   328        SVN_TEST_ASSERT_NO_RETURN(child_exit_status == APR_CHILD_DONE);
-> 329             SVN_TEST_ASSERT_NO_RETURN(child_exit_code == 0);
   330        SVN_TEST_ASSERT_NO_RETURN(child_exit_why == APR_PROC_EXIT);
   331      }
   332  }
(lldb) p child_exit_status
(apr_status_t) 70005
(lldb) p child_exit_code
(int) 10
(lldb) p child_exit_why
(apr_exit_why_e) APR_PROC_SIGNAL | APR_PROC_SIGNAL_CORE



That means svnserve crashed with a SIGBUS. I'll keep investigating.


-- Brane

Reply via email to