Hi. I've been seeing libcurl failing, somewhat consistently, at Curl_resolv_timeout.
The "data" local variable seems to come up changed after longjmp returns. (gdb) bt #0 Curl_resolv_timeout (conn=0x7ffe5002d770, hostname=0x7ffff6efb3c0 <X509_NAME_ENTRY_it> "\001", port=0x7fff, entry=0x7ffe50013930, timeoutms=0x7ffff6b9b4dd) at hostip.c:622 ... (gdb) print data $10 = (struct Curl_easy *) 0x770000006e (gdb) print *data Cannot access memory at address 0x770000006e (gdb) print conn->data $12 = (struct Curl_easy *) 0x7ffe5002db40 (gdb) print &data $13 = (struct Curl_easy **) 0x7ffe68ff8380 (gdb) print *(void**)0x7ffe68ff8380 $14 = (void *) 0x770000006e (gdb) info registers rax 0x1 0x1 rbx 0x7fff30017e7c 0x7fff30017e7c rcx 0x7ffff659b8a0 0x7ffff659b8a0 rdx 0x7ffff7b35a76 0x7ffff7b35a76 rsi 0x1 0x1 rdi 0x7ffff7ddac60 0x7ffff7ddac60 rbp 0x7ffe68ff83a0 0x7ffe68ff83a0 rsp 0x7ffe68ff81f0 0x7ffe68ff81f0 r8 0x7ffe68ff81f0 0x7ffe68ff81f0 r9 0x7ffe68ff83a0 0x7ffe68ff83a0 r10 0x8 0x8 r11 0x246 0x246 r12 0x1 0x1 r13 0x7ffe68ff99c0 0x7ffe68ff99c0 r14 0x7ffe68ff9700 0x7ffe68ff9700 r15 0x0 0x0 rip 0x7ffff7b35a7a 0x7ffff7b35a7a <Curl_resolv_timeout+285> eflags 0x202 [ IF ] cs 0x33 0x33 ss 0x2b 0x2b ds 0x0 0x0 es 0x0 0x0 fs 0x0 0x0 gs 0x0 0x0 I quite don't like the fact that the signal seems to interrupt the "wrong" thread. Or interrupted the thread too late: (gdb) bt #0 Curl_resolv_timeout (conn=0x7ffe5002d770, hostname=0x7ffff6efb3c0 <X509_NAME_ENTRY_it> "\001", port=0x7fff, entry=0x7ffe50013930, timeoutms=0x7ffff6b9b4dd) at hostip.c:622 #1 0x00007ffe68ff9700 in ?? () #2 0x000000000000007c in ?? () #3 0x00007ffff7baf833 in ?? () from /mnt/xl4/100k/runtime/curl/lib/libcurl.so #4 0x0000000000000001 in ?? () #5 0x00007ffff71b83da in xmlCharEncInput () from /usr/lib64/libxml2.so.2 #6 0x00007ffff6c163a2 in int_thread_release (hash=0x1d5f6c13f51) at err.c:469 #7 0x00007ffff6c166b5 in int_thread_del_item (d=<optimized out>) at err.c:542 #8 0x00007ffff6c170cd in ERR_remove_thread_state (id=<optimized out>) at err.c:994 #9 0x00007ffe68ff9700 in ?? () #10 0x00007ffe68ff8760 in ?? () #11 0x00007ffff7b6f1c1 in curl_dofree (ptr=0xffffffffffffffff, line=0xffffffff, source=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>) at memdebug.c:337 Backtrace stopped: previous frame inner to this frame (corrupt stack?) There are no threads that have their stack traces in Curl_resolv, which should be the location when the alarm goes off. Note that I compiled libcurl without thread resolver. The same exact problem happens when thread resolver is on, however. The stack trace typically leads into some other thread as well. This failure is consistent, that is if I break inside the setjmp return, it always fails (the value of 'data' pointer is wrong) This is on 4.4.41-35.53 kernel, 2.17-106.168 glibc (AWS Linux). The same happens on my Fedora desktop, different kernel/glibc versions. It also looks that the whole alarm functionality, even if threaded resolver is used, is not MT safe because of shared setjmp buffer, and the fact that the waiting (actual setjmp/longjmp calls) is done on the calling thread. Or am I missing something here? Thank you, Pawel. ------------------------------------------------------------------- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
