On Thu, Dec 24, 2020 at 01:59:34PM +0100, Mark Kettenis wrote:

> > Date: Thu, 24 Dec 2020 12:27:01 +0100
> > From: Otto Moerbeek <[email protected]>
> > 
> > On Thu, Dec 24, 2020 at 10:36:37AM +0100, Otto Moerbeek wrote:
> > 
> > > On Thu, Dec 24, 2020 at 08:25:45AM +0100, Uli Schlachter wrote:
> > > 
> > > > Hi everyone,
> > > > 
> > > > Am 24.12.20 um 04:35 schrieb Alexey Sokolov:
> > > > [...]
> > > > > Hi, getaddrinfo() crashes when multiple threads run getaddrinfo()
> > > > > concurrently. This didn't happen in 6.7.
> > > > > It looks like asr_ctx which is supposed to be thread-local according 
> > > > > to
> > > > > _asr_use_resolver(), is actually static / shared between threads.
> > > > >> How-To-Repeat:
> > > > [...]
> > > > > The reproducing code written by psychon, CCed, while debugging the 
> > > > > crash
> > > > > of a more complicated software
> > > > [...]
> > > > here is a bit more information about what is going on:
> > > > 
> > > > When run under egdb, you can "watch _asr". This reports:
> > > > 
> > > > Old value = (struct asr *) 0x0
> > > > New value = (struct asr *) 0x79905dce2a0
> > > > _asr_use_resolver (arg=<optimized out>) at 
> > > > /usr/src/lib/libc/asr/asr.c:360
> > > > 
> > > > (gdb) bt
> > > > #0  _asr_use_resolver (arg=<optimized out>) at
> > > > /usr/src/lib/libc/asr/asr.c:360
> > > > #1  0x000007990558862e in _libc_res_init () at
> > > > /usr/src/lib/libc/asr/res_init.c:44
> > > > #2  0x000007990552c1e4 in _libc_getaddrinfo (hostname=0x796629c37a3
> > > > "ipv4.google.com", servname=0x0, hints=0x7f7ffffce120,
> > > > res=0x7f7ffffce158) at /usr/src/lib/libc/asr/getaddrinfo.c:36
> > > > #3  0x00000796629c4d2e in do_lookup ()
> > > > #4  0x00000796629c4da3 in do_things ()
> > > > #5  0x00000796629c4ded in main ()
> > > > 
> > > > At this time, the program is still single-threaded (as seen in the
> > > > backtrace). This line in _asr_use_resolver() is the culprit:
> > > > 
> > > >     priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > > > 
> > > > the macro is defined as:
> > > > 
> > > > #define _THREAD_PRIVATE(keyname, storage, error)
> > > >         (_thread_cb.tc_tag_storage == NULL ? &(storage) :
> > > >             _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),
> > > >                 &(storage), sizeof(storage), error))
> > > > 
> > > > gdb has to say about this:
> > > > 
> > > > (gdb) print _thread_cb.tc_tag_storage
> > > > $3 = (void *(*)(void **, void *, size_t, void *)) 0x0
> > > > 
> > > > Thus, because we are still single-threaded, the program uses the static
> > > > variable _asr directly to store its "per thread" resolver (priv = &_asr
> > > > in the macro invocation above and thus *priv = _asr_resolver() ends up
> > > > storing the resolver in a global variable).
> > > > 
> > > > Later, threads are started. The next time that this code runs,
> > > > _thread_cb.tc_tag_storage points to _thread_tag_storage() from
> > > > lib/libc/thread/rthread_libc.c. This function uses memcpy() to
> > > > initialise the per-thread variant of the variable with the contents of
> > > > the global variant. Thus, all threads now end up using the resolver
> > > > instance that the main thread created.
> > > > 
> > > > I did not investigate what exactly goes wrong later, but since multiple
> > > > threads are now sharing a resolver, nothing good can happen.
> > > > 
> > > > The actual crash is a use-after-free (and it is always the same one):
> > > > iter_family() in lib/libc/asr/getaddrinfo_async.c executes
> > > > AS_FAMILY(as). This is:
> > > > 
> > > >  #define AS_FAMILY(p) ((p)->as_ctx->ac_family[(p)->as_family_idx])
> > > > 
> > > > (gdb) print as->as_ctx->ac_family[as->as_family_idx]
> > > > Cannot access memory at address 0x28b3653e000
> > > > 
> > > > The struct as->as_ctx was overwritten with 0xdf. Here is a random struct
> > > > member as an example:
> > > > 
> > > > (gdb) print as->as_ctx[0]->ac_domain
> > > > $8 = 0xdfdfdfdfdfdfdfdf <error: Cannot access memory at address
> > > > 0xdfdfdfdfdfdfdfdf>
> > > > 
> > > > At this point, as->as_ctx == _asr->a_ctx, but I am not sure how much
> > > > this helps / surprises, since the crash happens in the main thread...
> > > > I guess one of the threads freed the main thread's resolver in
> > > > _asr_resolver_done()..?
> > > > 
> > > > I guess that _asr_resolver_done() is supposed to set _asr back to a NULL
> > > > pointer. However, a breakpoint on pthread_create() says that it is not
> > > > doing so. In fact, I cannot find any callers of _asr_resolver_done()...?
> > > > Perhaps some code was changed to directly call _asr_ctx_unref() instead?
> > > > At least the code looks like asr_run() does _asr_async_free() at the end
> > > > and this just calls _asr_ctx_unref() directly, even though the context
> > > > was originally aquired by calling getaddrinfo_async() with last
> > > > parameter NULL, which it passed on to _asr_use_resolver(NULL) and thus
> > > > acquiring the thread-local context.
> > > > 
> > > > Cheers,
> > > > Uli
> > > > -- 
> > > > This can be a, a little complicated. Listen, my advice is... ask
> > > > somebody else for advice, at least someone who's... got more experience
> > > > at...  giving advice.
> > > > 
> > > 
> > > Thanks for this analysis, it's consistent with my observations I posted
> > > earlier.
> > > 
> > >   -Otto
> > > 
> > 
> > This works for me, but only lighlty tested. If this is an acceptable
> > approach, some of the _rthread_init() machinery could likely be
> > simplified.
> > 
> > There might be pitfalls, though.
> 
> Yes, I don't think that is a good idea.
> 
> The problem isn't really the late initialization, but rather that the
> ASR code doesn't use the _THREAD_PRIVATE() construction properly.
> 
> We could work around that issue in the ASR code by doing something like:
> 
>         priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
>         if (priv != &_asr && *priv == _asr)
>                 *priv = NULL;
>         if (*priv == NULL) {
>                 DPRINT("setting up thread-local resolver\n");
>                 *priv = _asr_resolver();
>         }
> 
> But maybe a better solution is to change _thread_tag_storage() to only
> do the memcpy() if we're in the main thread and do a memset() if not.

That would mean a change in the contract of _thread_tag_storage(). Now
it is an internal function to libc luckily, so we are able to check if
any caller depends on the current semantics.

        -Otto

> 
> > Index: rthread.c
> > ===================================================================
> > RCS file: /cvs/src/lib/librthread/rthread.c,v
> > retrieving revision 1.99
> > diff -u -p -r1.99 rthread.c
> > --- rthread.c       4 Nov 2017 22:53:57 -0000       1.99
> > +++ rthread.c       24 Dec 2020 11:23:04 -0000
> > @@ -190,6 +190,12 @@ _thread_key_zero(int key)
> >     }
> >  }
> >  
> > +static void __attribute__ ((constructor)) rthread_ct(void);
> > +static void rthread_ct(void)
> > +{
> > +   _rthread_init();
> > +}
> > +
> >  void
> >  _rthread_init(void)
> >  {
> > 
> > 
> 

Reply via email to