On Sun, Dec 2, 2012 at 2:57 PM, Lieven Govaerts <svn...@mobsol.be> wrote: > Hi, > > On Fri, Nov 30, 2012 at 8:19 PM, Philip Martin > <philip.mar...@wandisco.com> wrote: >> Stefan Küng <tortoise...@gmail.com> writes: >> >>> Here's how to reproduce: >>> >>> $ svn co https://tortoisesvn.googlecode.com/svn/trunk/src/Resources/tools >>> tools >>> >>> get the file here: >>> https://skydrive.live.com/redir?resid=D000F60A347E5B37!11352 >>> and replace the one in 'tools' with this one. >> >> I can reproduce locally by importing tools into a local repository, >> checking out, replacing the file and attempting the commit. That is >> using serf 1.1.x. Using serf trunk the commit goes into a loop. >> > > I see the same problem in a local repository. With some extra logging > I see that one of the delta windows isn't handled correctly by the > server: > > This is svn trunk with serf: > write_handler window: {sview_offset = 102400, sview_len = 102400, > tview_len = 102400, num_ops = 55, src_ops = 27, ops->action = > svn_txdelta_new, new_data = 0x15cbc28} > write_handler window: {sview_offset = 204800, sview_len = 102400, > tview_len = 102400, num_ops = 143, src_ops = 71, ops->action = > svn_txdelta_new, new_data = 0x15c0028} > write_handler window: {sview_offset = 307200, sview_len = 102400, > tview_len = 102400, num_ops = 23, src_ops = 11, ops->action = > svn_txdelta_new, new_data = 0x15be428} > write_handler window: {sview_offset = 0, sview_len = 0, tview_len = > 102400, num_ops = 1, src_ops = 0, ops->action = svn_txdelta_new, > new_data = 0x17e8028} > > This is svn 1.7.7 with neon: > write_handler window: {sview_offset = 102400, sview_len = 102400, > tview_len = 102400, num_ops = 55, src_ops = 27, ops->action = > svn_txdelta_new, new_data = 0x15cbc28} > write_handler window: {sview_offset = 204800, sview_len = 102400, > tview_len = 102400, num_ops = 143, src_ops = 71, ops->action = > svn_txdelta_new, new_data = 0x15c0028} > write_handler window: {sview_offset = 307200, sview_len = 102400, > tview_len = 102400, num_ops = 23, src_ops = 11, ops->action = > svn_txdelta_new, new_data = 0x15be428} > write_handler window: {sview_offset = 0, sview_len = 0, tview_len = > 102400, num_ops = 1, src_ops = 0, ops->action = svn_txdelta_new, > new_data = 0x17e8028} > ...
Copy-paste error, this is the log content with neon: write_handler window: {sview_offset = 0, sview_len = 102400, tview_len = 102400, num_ops = 117, src_ops = 58, ops->action = svn_txdelta_new, new_data = 0x3461028} write_handler window: {sview_offset = 102400, sview_len = 102400, tview_len = 102400, num_ops = 55, src_ops = 27, ops->action = svn_txdelta_new, new_data = 0x34a9828} write_handler window: {sview_offset = 204800, sview_len = 102400, tview_len = 102400, num_ops = 143, src_ops = 71, ops->action = svn_txdelta_new, new_data = 0x3461028} write_handler window: {sview_offset = 307200, sview_len = 102400, tview_len = 102400, num_ops = 23, src_ops = 11, ops->action = svn_txdelta_new, new_data = 0x345aa28} write_handler window: {sview_offset = 409600, sview_len = 102400, tview_len = 102400, num_ops = 1, src_ops = 0, ops->action = svn_txdelta_new, new_data = 0x3460028} write_handler window: {sview_offset = 512000, sview_len = 102400, tview_len = 102400, num_ops = 117, src_ops = 59, ops->action = svn_txdelta_new, new_data = 0x345da28} write_handler window: {sview_offset = 614400, sview_len = 102400, tview_len = 102400, num_ops = 13, src_ops = 6, ops->action = svn_txdelta_new, new_data = 0x36e1028} ... > > The core issue seems to be introduced in r1390435 as part of the > svndiff optimizations. > > Attached patch fixes the issue for me. I don't know how it impacts > other parts of the code, so review is appreciated. The patch still > contains logging so not meant to be applied directly! > >> As far as I can tell the problem is the client causing mod_dav_svn to >> SEGV (serf trunk keep retrying and causing multiple SEGVs). The >> mod_dav_svn stack trace isn't very useful, I'll need a httpd debug >> build: >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0x7fe2c42e7700 (LWP 31534)] >> 0x00007fe2c98245cc in apr_brigade_cleanup () from /usr/lib/libaprutil-1.so.0 >> (gdb) bt >> #0 0x00007fe2c98245cc in apr_brigade_cleanup () >> from /usr/lib/libaprutil-1.so.0 >> #1 0x00007fe2c75258bf in ?? () from /usr/lib/apache2/modules/mod_dav.so >> #2 0x00007fe2c7528960 in ?? () from /usr/lib/apache2/modules/mod_dav.so >> #3 0x00007fe2c9ee51f0 in ap_run_handler () >> #4 0x00007fe2c9ee563b in ap_invoke_handler () >> #5 0x00007fe2c9ef5448 in ap_process_request () >> #6 0x00007fe2c9ef2308 in ?? () >> #7 0x00007fe2c9eebbb0 in ap_run_process_connection () >> #8 0x00007fe2c9efb55d in ?? () >> #9 0x00007fe2c960f597 in ?? () from /usr/lib/libapr-1.so.0 >> #10 0x00007fe2c93cbb50 in start_thread (arg=<optimized out>) >> at pthread_create.c:304 >> #11 0x00007fe2c9115a7d in clone () >> at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >> #12 0x0000000000000000 in ?? () >> >> I'd guess it's memory corruption in the server. > > Well, besides the client seemingly sending incorrect svndiff windows, > the server should not crash. I got the following stack trace from > httpd in the debugger: > > Out of memory - terminating application. > > Program received signal SIGABRT, Aborted. > 0x00007fff88cd7ce2 in __pthread_kill () > (gdb) bt > #0 0x00007fff88cd7ce2 in __pthread_kill () > #1 0x00007fff8381f7d2 in pthread_kill () > #2 0x00007fff83810a7a in abort () > #3 0x00000001011ef651 in abort_on_pool_failure (retcode=12) at pool.c:55 > #4 0x000000010030e290 in apr_palloc () > #5 0x00000001012067c7 in svn_stringbuf_create_ensure > (blocksize=12804161111182623672, pool=0x100a72428) at string.c:329 > #6 0x0000000101206867 in svn_stringbuf_ncreate (bytes=0x1017dd035 > "??", size=12804161111182623667, pool=0x100a72428) > at string.c:346 > #7 0x0000000101199dbe in write_handler (baton=0x100a048b8, > buffer=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at svndiff.c:886 > #8 0x00000001012011fa in svn_stream_write (stream=0x100a04900, > data=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at stream.c:162 > #9 0x000000010102d30f in write_stream (stream=0x1009c8ba8, > buf=0x1009bfc48, bufsize=2048) at repos.c:2892 > #10 0x00000001007969d4 in dav_handler () > #11 0x0000000100001cd6 in ap_invoke_handler () > #12 0x0000000100021433 in ap_process_request () > #13 0x000000010001eb50 in ap_process_http_connection () > #14 0x000000010000da28 in ap_process_connection () > #15 0x0000000100027219 in child_main () > #16 0x000000010002696a in make_child () > #17 0x000000010002600b in ap_mpm_run () > #18 0x0000000100007139 in main () > (gdb) frame 7 > #7 0x0000000101199dbe in write_handler (baton=0x100a048b8, > buffer=0x1009bfc48 "????ل$8\001", len=0x7fff5fbff2d8) at svndiff.c:886 > 886 db->buffer = > (gdb) p *len > $1 = 2048 > (gdb) p remaining > $2 = 12804161111182623667 > .. > (gdb) p db->buffer->data > $5 = 0xe4d8c0d9ec42b70f <Address 0xe4d8c0d9ec42b70f out of bounds> > > Looks like the db->buffer struct is overwritten with data, thereby > invalidating the db->buffer->data pointer. > > > A third issue is that serf is either segfaulting or retrying when the > server aborts the connection due to this segfault. I'll look into this > further. > > Lieven