Hi everyone,

Am 24.12.20 um 04:35 schrieb Alexey Sokolov:
[...]
> Hi, getaddrinfo() crashes when multiple threads run getaddrinfo()
> concurrently. This didn't happen in 6.7.
> It looks like asr_ctx which is supposed to be thread-local according to
> _asr_use_resolver(), is actually static / shared between threads.
>> How-To-Repeat:
[...]
> The reproducing code written by psychon, CCed, while debugging the crash
> of a more complicated software
[...]
here is a bit more information about what is going on:

When run under egdb, you can "watch _asr". This reports:

Old value = (struct asr *) 0x0
New value = (struct asr *) 0x79905dce2a0
_asr_use_resolver (arg=<optimized out>) at /usr/src/lib/libc/asr/asr.c:360

(gdb) bt
#0  _asr_use_resolver (arg=<optimized out>) at
/usr/src/lib/libc/asr/asr.c:360
#1  0x000007990558862e in _libc_res_init () at
/usr/src/lib/libc/asr/res_init.c:44
#2  0x000007990552c1e4 in _libc_getaddrinfo (hostname=0x796629c37a3
"ipv4.google.com", servname=0x0, hints=0x7f7ffffce120,
res=0x7f7ffffce158) at /usr/src/lib/libc/asr/getaddrinfo.c:36
#3  0x00000796629c4d2e in do_lookup ()
#4  0x00000796629c4da3 in do_things ()
#5  0x00000796629c4ded in main ()

At this time, the program is still single-threaded (as seen in the
backtrace). This line in _asr_use_resolver() is the culprit:

    priv = _THREAD_PRIVATE(_asr, _asr, &_asr);

the macro is defined as:

#define _THREAD_PRIVATE(keyname, storage, error)
        (_thread_cb.tc_tag_storage == NULL ? &(storage) :
            _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),
                &(storage), sizeof(storage), error))

gdb has to say about this:

(gdb) print _thread_cb.tc_tag_storage
$3 = (void *(*)(void **, void *, size_t, void *)) 0x0

Thus, because we are still single-threaded, the program uses the static
variable _asr directly to store its "per thread" resolver (priv = &_asr
in the macro invocation above and thus *priv = _asr_resolver() ends up
storing the resolver in a global variable).

Later, threads are started. The next time that this code runs,
_thread_cb.tc_tag_storage points to _thread_tag_storage() from
lib/libc/thread/rthread_libc.c. This function uses memcpy() to
initialise the per-thread variant of the variable with the contents of
the global variant. Thus, all threads now end up using the resolver
instance that the main thread created.

I did not investigate what exactly goes wrong later, but since multiple
threads are now sharing a resolver, nothing good can happen.

The actual crash is a use-after-free (and it is always the same one):
iter_family() in lib/libc/asr/getaddrinfo_async.c executes
AS_FAMILY(as). This is:

 #define AS_FAMILY(p) ((p)->as_ctx->ac_family[(p)->as_family_idx])

(gdb) print as->as_ctx->ac_family[as->as_family_idx]
Cannot access memory at address 0x28b3653e000

The struct as->as_ctx was overwritten with 0xdf. Here is a random struct
member as an example:

(gdb) print as->as_ctx[0]->ac_domain
$8 = 0xdfdfdfdfdfdfdfdf <error: Cannot access memory at address
0xdfdfdfdfdfdfdfdf>

At this point, as->as_ctx == _asr->a_ctx, but I am not sure how much
this helps / surprises, since the crash happens in the main thread...
I guess one of the threads freed the main thread's resolver in
_asr_resolver_done()..?

I guess that _asr_resolver_done() is supposed to set _asr back to a NULL
pointer. However, a breakpoint on pthread_create() says that it is not
doing so. In fact, I cannot find any callers of _asr_resolver_done()...?
Perhaps some code was changed to directly call _asr_ctx_unref() instead?
At least the code looks like asr_run() does _asr_async_free() at the end
and this just calls _asr_ctx_unref() directly, even though the context
was originally aquired by calling getaddrinfo_async() with last
parameter NULL, which it passed on to _asr_use_resolver(NULL) and thus
acquiring the thread-local context.

Cheers,
Uli
-- 
This can be a, a little complicated. Listen, my advice is... ask
somebody else for advice, at least someone who's... got more experience
at...  giving advice.

Reply via email to