I configured fossil to use openssl (for https) and built it for Linux (kernel 3.11.0-12-generic, Ubuntu 13.10). Fossil crashes during the 'pull' portion of a 'fossil update' or just while running 'fossil pull'. The pull implies the transfer of a few large artifacts (~60MB range) as well as lots of small ones.
The crash happens when the client speaks SSL to the fossil server via stunnel, and also when the client speaks in the clear directly to the fossil server. I believe that the server is running out of memory during the transaction, and the resulting server response causes the client to fail. $ uname -a Linux <snip> 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:12:00 UTC 2013 i686 i686 i686 GNU/Linux $ ~/fossil-src-20140612172556/fossil version This is fossil version 1.29 [3e5ebe2b90] 2014-06-12 17:25:56 UTC I am unfortunately not able to get much useful information out of the core file, apart from the fact that we seem to have been in the openssl lib when we crashed: $ ~/fossil-src-20140612172556/fossil pull Pull from https://e...@dev.packetup.net:10444/ Segmentation fault (core dumped) 0 received: 0 $ gdb ~/fossil-src-20140612172556/fossil ./core GNU gdb (GDB) 7.6.1-ubuntu Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html > This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/eric/fossil-src-20140612172556/fossil...done. warning: exec file is newer than core file. [New LWP 7065] warning: Can't read pathname for load map: Input/output error. Core was generated by `/home/eric/fossil-src-20140612172556/fossil update'. Program terminated with signal 11, Segmentation fault. #0 0xb75bffb0 in ?? () from /lib/i386-linux-gnu/libcrypto.so.1.0.0 (gdb) where #0 0xb75bffb0 in ?? () from /lib/i386-linux-gnu/libcrypto.so.1.0.0 #1 0xefcdab89 in ?? () #2 0x98badcfe in ?? () #3 0x10325476 in ?? () #4 0xc3d2e1f0 in ?? () #5 0x402a84b0 in ?? () #6 0x00000000 in ?? () This happens every time given my current clone pairs. I was not able to reproduce it under valgrind, because the slow-down imposed by valgrind caused the initial network transaction to time out. I've reproduced it under strace, but I'm not sure how helpful that output will be. I then recompiled fossil as a static executable and re-ran my test. Fossil usually crashes still, but one time it did not crash. Instead, it produced the following output: $ ../fossil-src-20140612172556/fossil pull Pull from https://e...@dev.packetup.net:10444/ Round-trips: 1 Artifacts sent: 0 received: 0 server replies with HTML instead of fossil sync protocol: file 010dfdc0db6ff3cf63e0da8f90681fef5e80ee56 5032 /* * Copyright (C) 1995, 1996, 1997, 1998, and 1999 WIDE Project. {snip a bunch of file data} file 25fc6f1d7ddb3d0c6f812a397752b83472776e56 8d0ef3a2aee8895877014a1b273cc4dfd20e5f34 272 AWP 1d:C {snipped my check-in comment} D 2014-08-18T16:46:56.6919j2@22,q:nvram_read.c 83e2de86fe92201e7662fc9db14387f4d4d79a0fjd@9kq ,1I:8d0ef3a2aee8895877014a1b273cc4dfd20e5f34 U eas Z a2e9f38c4965a46f1e4f925f9e9a8a07 3GbccP;file 267f9bd5f5b347d26ede6b0c20c332d8af1b3bc0 5212 Round-trips: 2 Artifacts sent: 0 received: 51 server replies with HTML instead of fossil sync protocol: <p class="generalError">out of memory</p> Round-trips: 2 Artifacts sent: 0 received: 51 Pull finished with 4250 bytes sent, 99232 bytes received $ So I guess a hypothesis is that the server is running out of memory and is barfing back some response that the client cannot handle, and the client is crashing. I have taken a wire capture of the cleartext client<->server interaction and can share that with the devs privately if they would like. There is not a lot of data transferred in this scenario -- only 12 packets. The server works for 80 seconds, sends back a small response, and the connection is terminated. I ran fossil and stepped it through the debugger starting from the blob_uncompress. Here's the contents of the uncompressed blob: (gdb) where #0 blob_uncompress (pIn=pIn@entry=0xbffff434, pOut=pOut@entry=0xbffff434) at ./src/blob.c:933 #1 0x0807bf60 in http_exchange (pSend=pSend@entry=0xbffff420, pReply=pReply@entry=0xbffff434, useLogin=1, maxRedirect=maxRedirect@entry=20) at ./src/http.c:390 #2 0x080ca4b5 in client_sync (syncFlags=<optimized out>, configRcvMask=<optimized out>, configSendMask=configSendMask@entry=0) at ./src/xfer.c:1561 #3 0x080acd1c in pull_cmd () at ./src/sync.c:172 #4 0x0804d402 in main (argc=4, argv=0xbffff6c4) at ./src/main.c:674 (gdb) print pOut->aData $8 = 0x844aa08 "file 59cdb30316cbb0d1d35f248ac3bcabcde7a419c2 988f3554814eeae25a87235c9c8e3a4f84b29a75 64937242\n<p class=\"generalError\">out of memory</p>" (gdb) My guess is that we are going to crash on that record: looks like the file size is indicated as ~64MB but the real contents are much smaller (just the OOM error message): (gdb) n 932 return 0; (gdb) 933 } (gdb) http_exchange (pSend=pSend@entry=0xbffff420, pReply=pReply@entry=0xbffff434, useLogin=1, maxRedirect=maxRedirect@entry=20) at ./src/http.c:401 401 if( ! g.url.isSsh ) closeConnection = 1; /* FIX ME */ (gdb) 403 transport_close(&g.url); (gdb) 415 } (gdb) client_sync (syncFlags=<optimized out>, configRcvMask=<optimized out>, configSendMask=configSendMask@entry=0) at ./src/xfer.c:1569 1569 if( syncFlags & SYNC_VERBOSE ){ (gdb) 1575 nArtifactSent += xfer.nFileSent + xfer.nDeltaSent; (gdb) 1576 fossil_print(zBriefFormat, nRoundtrip, nArtifactSent, nArtifactRcvd); (gdb) 1574 nRoundtrip++; (gdb) 1576 fossil_print(zBriefFormat, nRoundtrip, nArtifactSent, nArtifactRcvd); (gdb) 1586d-tr blob_reset(&send);t: 0 received: 0 (gdb) 1580 xfer.nFileSent = 0; (gdb) 1581 xfer.nDeltaSent = 0; (gdb) 1582 xfer.nGimmeSent = 0; (gdb) 1583 xfer.nIGotSent = 0; (gdb) 1586 blob_reset(&send); (gdb) 1587 rArrivalTime = db_double(0.0, "SELECT julianday('now')"); (gdb) 1590 if( syncFlags & SYNC_PRIVATE ){ (gdb) 1587 rArrivalTime = db_double(0.0, "SELECT julianday('now')"); (gdb) 1590 if( syncFlags & SYNC_PRIVATE ){ (gdb) 1597 if( syncFlags & SYNC_PULL ){ (gdb) 1578 nCardSent = 0; (gdb) 1597 if( syncFlags & SYNC_PULL ){ (gdb) 1598 blob_appendf(&send, "pull %s %s\n", zSCode, zPCode); (gdb) 1599 nCardSent++; (gdb) 1601 if( syncFlags & SYNC_PUSH ){ (gdb) 1866 blob_reset(&xfer.line); (gdb) 1608 while( blob_line(&recv, &xfer.line) ){ (gdb) 1609 if( blob_buffer(&xfer.line)[0]=='#' ){ (gdb) 1624 xfer.nToken = blob_tokenize(&xfer.line, xfer.aToken, count(xfer.aToken)); (gdb) 1626 if( (syncFlags & SYNC_VERBOSE)!=0 && recv.nUsed>0 ){ (gdb) 1624 xfer.nToken = blob_tokenize(&xfer.line, xfer.aToken, count(xfer.aToken)); (gdb) 1625 nCardRcvd++; (gdb) 1626 if( (syncFlags & SYNC_VERBOSE)!=0 && recv.nUsed>0 ){ (gdb) 1640 if( blob_eq(&xfer.aToken[0],"file") ){ (gdb) 0x08048330 in ?? () (gdb) Cannot find bounds of current function (gdb) so at this point we have corrupted our stack I guess: (gdb) where #0 0x08048330 in ?? () #1 0x00000060 in ?? () #2 0x00000060 in ?? () #3 0x0844aa08 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) Let me know if you need anything further. Eric
_______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users