> Date: Thu, 24 Dec 2020 12:27:01 +0100
> From: Otto Moerbeek <[email protected]>
>
> On Thu, Dec 24, 2020 at 10:36:37AM +0100, Otto Moerbeek wrote:
>
> > On Thu, Dec 24, 2020 at 08:25:45AM +0100, Uli Schlachter wrote:
> >
> > > Hi everyone,
> > >
> > > Am 24.12.20 um 04:35 schrieb Alexey Sokolov:
> > > [...]
> > > > Hi, getaddrinfo() crashes when multiple threads run getaddrinfo()
> > > > concurrently. This didn't happen in 6.7.
> > > > It looks like asr_ctx which is supposed to be thread-local according to
> > > > _asr_use_resolver(), is actually static / shared between threads.
> > > >> How-To-Repeat:
> > > [...]
> > > > The reproducing code written by psychon, CCed, while debugging the crash
> > > > of a more complicated software
> > > [...]
> > > here is a bit more information about what is going on:
> > >
> > > When run under egdb, you can "watch _asr". This reports:
> > >
> > > Old value = (struct asr *) 0x0
> > > New value = (struct asr *) 0x79905dce2a0
> > > _asr_use_resolver (arg=<optimized out>) at /usr/src/lib/libc/asr/asr.c:360
> > >
> > > (gdb) bt
> > > #0 _asr_use_resolver (arg=<optimized out>) at
> > > /usr/src/lib/libc/asr/asr.c:360
> > > #1 0x000007990558862e in _libc_res_init () at
> > > /usr/src/lib/libc/asr/res_init.c:44
> > > #2 0x000007990552c1e4 in _libc_getaddrinfo (hostname=0x796629c37a3
> > > "ipv4.google.com", servname=0x0, hints=0x7f7ffffce120,
> > > res=0x7f7ffffce158) at /usr/src/lib/libc/asr/getaddrinfo.c:36
> > > #3 0x00000796629c4d2e in do_lookup ()
> > > #4 0x00000796629c4da3 in do_things ()
> > > #5 0x00000796629c4ded in main ()
> > >
> > > At this time, the program is still single-threaded (as seen in the
> > > backtrace). This line in _asr_use_resolver() is the culprit:
> > >
> > > priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > >
> > > the macro is defined as:
> > >
> > > #define _THREAD_PRIVATE(keyname, storage, error)
> > > (_thread_cb.tc_tag_storage == NULL ? &(storage) :
> > > _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),
> > > &(storage), sizeof(storage), error))
> > >
> > > gdb has to say about this:
> > >
> > > (gdb) print _thread_cb.tc_tag_storage
> > > $3 = (void *(*)(void **, void *, size_t, void *)) 0x0
> > >
> > > Thus, because we are still single-threaded, the program uses the static
> > > variable _asr directly to store its "per thread" resolver (priv = &_asr
> > > in the macro invocation above and thus *priv = _asr_resolver() ends up
> > > storing the resolver in a global variable).
> > >
> > > Later, threads are started. The next time that this code runs,
> > > _thread_cb.tc_tag_storage points to _thread_tag_storage() from
> > > lib/libc/thread/rthread_libc.c. This function uses memcpy() to
> > > initialise the per-thread variant of the variable with the contents of
> > > the global variant. Thus, all threads now end up using the resolver
> > > instance that the main thread created.
> > >
> > > I did not investigate what exactly goes wrong later, but since multiple
> > > threads are now sharing a resolver, nothing good can happen.
> > >
> > > The actual crash is a use-after-free (and it is always the same one):
> > > iter_family() in lib/libc/asr/getaddrinfo_async.c executes
> > > AS_FAMILY(as). This is:
> > >
> > > #define AS_FAMILY(p) ((p)->as_ctx->ac_family[(p)->as_family_idx])
> > >
> > > (gdb) print as->as_ctx->ac_family[as->as_family_idx]
> > > Cannot access memory at address 0x28b3653e000
> > >
> > > The struct as->as_ctx was overwritten with 0xdf. Here is a random struct
> > > member as an example:
> > >
> > > (gdb) print as->as_ctx[0]->ac_domain
> > > $8 = 0xdfdfdfdfdfdfdfdf <error: Cannot access memory at address
> > > 0xdfdfdfdfdfdfdfdf>
> > >
> > > At this point, as->as_ctx == _asr->a_ctx, but I am not sure how much
> > > this helps / surprises, since the crash happens in the main thread...
> > > I guess one of the threads freed the main thread's resolver in
> > > _asr_resolver_done()..?
> > >
> > > I guess that _asr_resolver_done() is supposed to set _asr back to a NULL
> > > pointer. However, a breakpoint on pthread_create() says that it is not
> > > doing so. In fact, I cannot find any callers of _asr_resolver_done()...?
> > > Perhaps some code was changed to directly call _asr_ctx_unref() instead?
> > > At least the code looks like asr_run() does _asr_async_free() at the end
> > > and this just calls _asr_ctx_unref() directly, even though the context
> > > was originally aquired by calling getaddrinfo_async() with last
> > > parameter NULL, which it passed on to _asr_use_resolver(NULL) and thus
> > > acquiring the thread-local context.
> > >
> > > Cheers,
> > > Uli
> > > --
> > > This can be a, a little complicated. Listen, my advice is... ask
> > > somebody else for advice, at least someone who's... got more experience
> > > at... giving advice.
> > >
> >
> > Thanks for this analysis, it's consistent with my observations I posted
> > earlier.
> >
> > -Otto
> >
>
> This works for me, but only lighlty tested. If this is an acceptable
> approach, some of the _rthread_init() machinery could likely be
> simplified.
>
> There might be pitfalls, though.
Yes, I don't think that is a good idea.
The problem isn't really the late initialization, but rather that the
ASR code doesn't use the _THREAD_PRIVATE() construction properly.
We could work around that issue in the ASR code by doing something like:
priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
if (priv != &_asr && *priv == _asr)
*priv = NULL;
if (*priv == NULL) {
DPRINT("setting up thread-local resolver\n");
*priv = _asr_resolver();
}
But maybe a better solution is to change _thread_tag_storage() to only
do the memcpy() if we're in the main thread and do a memset() if not.
> Index: rthread.c
> ===================================================================
> RCS file: /cvs/src/lib/librthread/rthread.c,v
> retrieving revision 1.99
> diff -u -p -r1.99 rthread.c
> --- rthread.c 4 Nov 2017 22:53:57 -0000 1.99
> +++ rthread.c 24 Dec 2020 11:23:04 -0000
> @@ -190,6 +190,12 @@ _thread_key_zero(int key)
> }
> }
>
> +static void __attribute__ ((constructor)) rthread_ct(void);
> +static void rthread_ct(void)
> +{
> + _rthread_init();
> +}
> +
> void
> _rthread_init(void)
> {
>
>