On Thu, Dec 24, 2020 at 08:25:45AM +0100, Uli Schlachter wrote:
> Hi everyone,
>
> Am 24.12.20 um 04:35 schrieb Alexey Sokolov:
> [...]
> > Hi, getaddrinfo() crashes when multiple threads run getaddrinfo()
> > concurrently. This didn't happen in 6.7.
> > It looks like asr_ctx which is supposed to be thread-local according to
> > _asr_use_resolver(), is actually static / shared between threads.
> >> How-To-Repeat:
> [...]
> > The reproducing code written by psychon, CCed, while debugging the crash
> > of a more complicated software
> [...]
> here is a bit more information about what is going on:
>
> When run under egdb, you can "watch _asr". This reports:
>
> Old value = (struct asr *) 0x0
> New value = (struct asr *) 0x79905dce2a0
> _asr_use_resolver (arg=<optimized out>) at /usr/src/lib/libc/asr/asr.c:360
>
> (gdb) bt
> #0 _asr_use_resolver (arg=<optimized out>) at
> /usr/src/lib/libc/asr/asr.c:360
> #1 0x000007990558862e in _libc_res_init () at
> /usr/src/lib/libc/asr/res_init.c:44
> #2 0x000007990552c1e4 in _libc_getaddrinfo (hostname=0x796629c37a3
> "ipv4.google.com", servname=0x0, hints=0x7f7ffffce120,
> res=0x7f7ffffce158) at /usr/src/lib/libc/asr/getaddrinfo.c:36
> #3 0x00000796629c4d2e in do_lookup ()
> #4 0x00000796629c4da3 in do_things ()
> #5 0x00000796629c4ded in main ()
>
> At this time, the program is still single-threaded (as seen in the
> backtrace). This line in _asr_use_resolver() is the culprit:
>
> priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
>
> the macro is defined as:
>
> #define _THREAD_PRIVATE(keyname, storage, error)
> (_thread_cb.tc_tag_storage == NULL ? &(storage) :
> _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),
> &(storage), sizeof(storage), error))
>
> gdb has to say about this:
>
> (gdb) print _thread_cb.tc_tag_storage
> $3 = (void *(*)(void **, void *, size_t, void *)) 0x0
>
> Thus, because we are still single-threaded, the program uses the static
> variable _asr directly to store its "per thread" resolver (priv = &_asr
> in the macro invocation above and thus *priv = _asr_resolver() ends up
> storing the resolver in a global variable).
>
> Later, threads are started. The next time that this code runs,
> _thread_cb.tc_tag_storage points to _thread_tag_storage() from
> lib/libc/thread/rthread_libc.c. This function uses memcpy() to
> initialise the per-thread variant of the variable with the contents of
> the global variant. Thus, all threads now end up using the resolver
> instance that the main thread created.
>
> I did not investigate what exactly goes wrong later, but since multiple
> threads are now sharing a resolver, nothing good can happen.
>
> The actual crash is a use-after-free (and it is always the same one):
> iter_family() in lib/libc/asr/getaddrinfo_async.c executes
> AS_FAMILY(as). This is:
>
> #define AS_FAMILY(p) ((p)->as_ctx->ac_family[(p)->as_family_idx])
>
> (gdb) print as->as_ctx->ac_family[as->as_family_idx]
> Cannot access memory at address 0x28b3653e000
>
> The struct as->as_ctx was overwritten with 0xdf. Here is a random struct
> member as an example:
>
> (gdb) print as->as_ctx[0]->ac_domain
> $8 = 0xdfdfdfdfdfdfdfdf <error: Cannot access memory at address
> 0xdfdfdfdfdfdfdfdf>
>
> At this point, as->as_ctx == _asr->a_ctx, but I am not sure how much
> this helps / surprises, since the crash happens in the main thread...
> I guess one of the threads freed the main thread's resolver in
> _asr_resolver_done()..?
>
> I guess that _asr_resolver_done() is supposed to set _asr back to a NULL
> pointer. However, a breakpoint on pthread_create() says that it is not
> doing so. In fact, I cannot find any callers of _asr_resolver_done()...?
> Perhaps some code was changed to directly call _asr_ctx_unref() instead?
> At least the code looks like asr_run() does _asr_async_free() at the end
> and this just calls _asr_ctx_unref() directly, even though the context
> was originally aquired by calling getaddrinfo_async() with last
> parameter NULL, which it passed on to _asr_use_resolver(NULL) and thus
> acquiring the thread-local context.
>
> Cheers,
> Uli
> --
> This can be a, a little complicated. Listen, my advice is... ask
> somebody else for advice, at least someone who's... got more experience
> at... giving advice.
>
Thanks for this analysis, it's consistent with my observations I posted
earlier.
-Otto