On 12/14/2016 4:30 AM, Theo van Klaveren wrote:
I am seeing an intermittent (2-3 times a week) segfault inside libcurl
code, and it is frequent enough that it is becoming a real problem.
I'm having problems finding the root cause so any help would be
appreciated.

A bit of context: This is an application that is responsible for
fetching data from multiple sources via HTTP and uploading the results
to a central server via HTTPS. It is multi-threaded, but to rule out
any threading problems a global mutex is locked around use of libcurl.
This at least reduced the number of different stack traces I am seeing
to this one:

(gdb) where
<<snip>>
#6  <signal handler called>
#7  0x080a8e12 in Curl_expire_clear (data=0xb3c1acf4) at multi.c:3022
#8  0x080be7f7 in Curl_close (data=0xb3c1acf4) at url.c:396
#9  0x080a7986 in curl_multi_cleanup (multi=0xb3c19b8c) at multi.c:2211
#10 0x080be833 in Curl_close (data=0xb3c03204) at url.c:408
#11 0x080a126a in curl_easy_cleanup (data=0xb3c03204) at easy.c:833
#12 0x080876da in HTTPClient::HTTPRequest (this=0x98ad74c,
     aURL=0xb3c13604 "http://localhost:50080/<<redacted>>",
     aPostData=0x0, aPostSize=0, aPage=0xb6dd32d8, aTimeoutMS=10000,
     aConnectTimeoutMS=10000, aContentType=0x0, aHeaderFields=0x0)
     at <<redacted>>/HTTPClient.cpp:115
<<snip>>

The crash happens in the cleanup after what appears to be an otherwise
successful HTTP GET request.
It looks like something in Curl's state has become corrupted somewhere:

(gdb) list
3021        /* flush the timeout list too */
3022        while(list->size > 0)
<--- crash is here
3023          Curl_llist_remove(list, list->tail, NULL);
(gdb) print list
$9 = (struct curl_llist *) 0x0
(gdb) print data->state
$11 = {conn_cache = 0xb3c19bf4, multi_owned_by_easy = false, keeps_speed = {
     tv_sec = 0, tv_usec = 0}, lastconnect = 0x0,
   headerbuff = 0xb3c19e4c
"�\002��@\236��@\236��20375\t66700\t2\t2\t2\t0\t0\t0\t0\t0\t0\t4\t14\t3\t0\t0\t0\t0\n1481544716015001\t1\t2\t4\t194\t9717\t20373\t59900\t4\t4\t4\t0\t0\t0\t0\t0\t0\t5\t12\t2\t0\t0\t0\t0\n1481544716015002\t1\t3\t7\t194\t9717\t20374\t57900\t4\t4\t4\t0\t0\t0\t0\t0\t0\t5\t2\t"...,
   headersize = 256, buffer = '\0' <repeats 16384 times>,
   uploadbuffer = '\0' <repeats 16384 times>, current_speed = -1,
   this_is_a_follow = false, first_host = 0x0, first_remote_port = 0,
   session = 0x0, sessionage = 0, tempwrite = 0x0, tempwritesize = 0,
   tempwritetype = 0, scratch = 0x0, errorbuf = false, os_errno = 0,
   prev_signal = 0, allow_port = false, digest = {nonce = 0x0, cnonce = 0x0,
     realm = 0x0, algo = 0, stale = false, opaque = 0x0, qop = 0x0,
     algorithm = 0x0, nc = 0}, proxydigest = {nonce = 0x0, cnonce = 0x0,
     realm = 0x0, algo = 0, stale = false, opaque = 0x0, qop = 0x0,
     algorithm = 0x0, nc = 0}, authhost = {want = 0, picked = 0, avail = 0,
     done = false, multi = false, iestyle = false}, authproxy = {want = 0,
     picked = 0, avail = 0, done = false, multi = false, iestyle = false},
   authproblem = false, resolver = 0x0, expiretime = {tv_sec = 0,
     tv_usec = -1}, timenode = {smaller = 0xffffffff, larger = 0x0, same = 0x0,
     key = {tv_sec = 0, tv_usec = 0}, payload = 0x0}, timeoutlist = 0x0,
   most_recent_ftp_entrypath = 0x0, ftp_trying_alternative = false,
   httpversion = 0, expect100header = false, pipe_broke = false,
---Type <return> to continue, or q <return> to quit---
   prev_block_had_trailing_cr = false, crlf_conversions = 0, pathbuffer = 0x0,
   path = 0x0, slash_removed = false, use_range = false,
   rangestringalloc = false, range = 0x0, resume_from = 0,
   rtsp_next_client_CSeq = 0, rtsp_next_server_CSeq = 0, rtsp_CSeq_recv = 0,
   infilesize = 0, drain = 0, done = false, fread_func = 0, in = 0x0,
   stream_depends_on = 0x0, stream_depends_e = false, stream_weight = 0}

 From my reading of the code I think tv_usec and timenode.smaller are
not supposed to be 0xffffffff. The list of timeouts is allowed to be
NULL, but only if expiretime.tv_sec and expiretime.tv_usec are both 0.
That is the immediate cause of the segfault. But I can't figure out
why expiretime.tv_usec and timenode.smaller are set to invalid values.

Of course adding a NULL guard around that list dereference on line
3022 would fix the immediate crash but that doesn't fix the real
issue.

The code that calls libcurl is pretty much a copy-paste of libcurl
example code, but I can post a (slightly redacted) version if it's
needed.

Are you sure you're not calling the cleanup function twice or in the wrong order [1]? This looks similar to a different report from earlier this week [2]. Review thread safety [3] and if that doesn't help follow all the instructions in the reply to enable address and undefined behavior sanitizer and see what happens. If you still have a problem you could give us a self contained example that we can use to reproduce.


[1]: https://curl.haxx.se/libcurl/c/curl_multi_cleanup.html
[2]: https://curl.haxx.se/mail/lib-2016-12/0061.html
[3]: https://curl.haxx.se/libcurl/c/threadsafe.html

-------------------------------------------------------------------
List admin: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:  https://curl.haxx.se/mail/etiquette.html

Reply via email to