Hi.

I've been seeing libcurl failing, somewhat consistently, at
Curl_resolv_timeout.

The "data" local variable seems to come up changed after longjmp returns.

(gdb) bt
#0  Curl_resolv_timeout (conn=0x7ffe5002d770, hostname=0x7ffff6efb3c0
<X509_NAME_ENTRY_it> "\001", port=0x7fff, entry=0x7ffe50013930,
timeoutms=0x7ffff6b9b4dd) at hostip.c:622
...
(gdb) print data
$10 = (struct Curl_easy *) 0x770000006e
(gdb) print *data
Cannot access memory at address 0x770000006e
(gdb) print conn->data
$12 = (struct Curl_easy *) 0x7ffe5002db40
(gdb) print &data
$13 = (struct Curl_easy **) 0x7ffe68ff8380
(gdb) print *(void**)0x7ffe68ff8380
$14 = (void *) 0x770000006e
(gdb) info  registers
rax            0x1      0x1
rbx            0x7fff30017e7c   0x7fff30017e7c
rcx            0x7ffff659b8a0   0x7ffff659b8a0
rdx            0x7ffff7b35a76   0x7ffff7b35a76
rsi            0x1      0x1
rdi            0x7ffff7ddac60   0x7ffff7ddac60
rbp            0x7ffe68ff83a0   0x7ffe68ff83a0
rsp            0x7ffe68ff81f0   0x7ffe68ff81f0
r8             0x7ffe68ff81f0   0x7ffe68ff81f0
r9             0x7ffe68ff83a0   0x7ffe68ff83a0
r10            0x8      0x8
r11            0x246    0x246
r12            0x1      0x1
r13            0x7ffe68ff99c0   0x7ffe68ff99c0
r14            0x7ffe68ff9700   0x7ffe68ff9700
r15            0x0      0x0
rip            0x7ffff7b35a7a   0x7ffff7b35a7a <Curl_resolv_timeout+285>
eflags         0x202    [ IF ]
cs             0x33     0x33
ss             0x2b     0x2b
ds             0x0      0x0
es             0x0      0x0
fs             0x0      0x0
gs             0x0      0x0

I quite don't like the fact that the signal seems to interrupt the "wrong"
thread. Or interrupted the thread too late:

(gdb) bt
#0  Curl_resolv_timeout (conn=0x7ffe5002d770, hostname=0x7ffff6efb3c0
<X509_NAME_ENTRY_it> "\001", port=0x7fff, entry=0x7ffe50013930,
timeoutms=0x7ffff6b9b4dd) at hostip.c:622
#1  0x00007ffe68ff9700 in ?? ()
#2  0x000000000000007c in ?? ()
#3  0x00007ffff7baf833 in ?? () from /mnt/xl4/100k/runtime/curl/lib/libcurl.so
#4  0x0000000000000001 in ?? ()
#5  0x00007ffff71b83da in xmlCharEncInput () from /usr/lib64/libxml2.so.2
#6  0x00007ffff6c163a2 in int_thread_release (hash=0x1d5f6c13f51) at err.c:469
#7  0x00007ffff6c166b5 in int_thread_del_item (d=<optimized out>) at err.c:542
#8  0x00007ffff6c170cd in ERR_remove_thread_state (id=<optimized out>)
at err.c:994
#9  0x00007ffe68ff9700 in ?? ()
#10 0x00007ffe68ff8760 in ?? ()
#11 0x00007ffff7b6f1c1 in curl_dofree (ptr=0xffffffffffffffff,
line=0xffffffff, source=0xffffffffffffffff <Address 0xffffffffffffffff
out of bounds>) at memdebug.c:337
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

There are no threads that have their stack traces in Curl_resolv, which
should be the location when the alarm goes off. Note that I compiled libcurl
without thread resolver. The same exact problem happens when thread resolver
is on, however. The stack trace typically leads into some other thread as
well.

This failure is consistent, that is if I break inside the setjmp return,
it always fails (the value of 'data' pointer is wrong)

This is on 4.4.41-35.53 kernel, 2.17-106.168 glibc (AWS Linux). The same
happens on my Fedora desktop, different kernel/glibc versions.

It also looks that the whole alarm functionality, even if threaded resolver
is used, is not MT safe because of shared setjmp buffer, and the fact that
the waiting (actual setjmp/longjmp calls) is done on the calling thread. Or
am I missing something here?

Thank you,
  Pawel.
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Reply via email to