On Thu, Dec 24, 2020 at 02:28:25PM +0100, Otto Moerbeek wrote:

> On Thu, Dec 24, 2020 at 01:59:34PM +0100, Mark Kettenis wrote:
> 
> > > Date: Thu, 24 Dec 2020 12:27:01 +0100
> > > From: Otto Moerbeek <[email protected]>
> > > 
> > > On Thu, Dec 24, 2020 at 10:36:37AM +0100, Otto Moerbeek wrote:
> > > 
> > > > On Thu, Dec 24, 2020 at 08:25:45AM +0100, Uli Schlachter wrote:
> > > > 
> > > > > Hi everyone,
> > > > > 
> > > > > Am 24.12.20 um 04:35 schrieb Alexey Sokolov:
> > > > > [...]
> > > > > > Hi, getaddrinfo() crashes when multiple threads run getaddrinfo()
> > > > > > concurrently. This didn't happen in 6.7.
> > > > > > It looks like asr_ctx which is supposed to be thread-local 
> > > > > > according to
> > > > > > _asr_use_resolver(), is actually static / shared between threads.
> > > > > >> How-To-Repeat:
> > > > > [...]
> > > > > > The reproducing code written by psychon, CCed, while debugging the 
> > > > > > crash
> > > > > > of a more complicated software
> > > > > [...]
> > > > > here is a bit more information about what is going on:
> > > > > 
> > > > > When run under egdb, you can "watch _asr". This reports:
> > > > > 
> > > > > Old value = (struct asr *) 0x0
> > > > > New value = (struct asr *) 0x79905dce2a0
> > > > > _asr_use_resolver (arg=<optimized out>) at 
> > > > > /usr/src/lib/libc/asr/asr.c:360
> > > > > 
> > > > > (gdb) bt
> > > > > #0  _asr_use_resolver (arg=<optimized out>) at
> > > > > /usr/src/lib/libc/asr/asr.c:360
> > > > > #1  0x000007990558862e in _libc_res_init () at
> > > > > /usr/src/lib/libc/asr/res_init.c:44
> > > > > #2  0x000007990552c1e4 in _libc_getaddrinfo (hostname=0x796629c37a3
> > > > > "ipv4.google.com", servname=0x0, hints=0x7f7ffffce120,
> > > > > res=0x7f7ffffce158) at /usr/src/lib/libc/asr/getaddrinfo.c:36
> > > > > #3  0x00000796629c4d2e in do_lookup ()
> > > > > #4  0x00000796629c4da3 in do_things ()
> > > > > #5  0x00000796629c4ded in main ()
> > > > > 
> > > > > At this time, the program is still single-threaded (as seen in the
> > > > > backtrace). This line in _asr_use_resolver() is the culprit:
> > > > > 
> > > > >     priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > > > > 
> > > > > the macro is defined as:
> > > > > 
> > > > > #define _THREAD_PRIVATE(keyname, storage, error)
> > > > >       (_thread_cb.tc_tag_storage == NULL ? &(storage) :
> > > > >           _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),
> > > > >               &(storage), sizeof(storage), error))
> > > > > 
> > > > > gdb has to say about this:
> > > > > 
> > > > > (gdb) print _thread_cb.tc_tag_storage
> > > > > $3 = (void *(*)(void **, void *, size_t, void *)) 0x0
> > > > > 
> > > > > Thus, because we are still single-threaded, the program uses the 
> > > > > static
> > > > > variable _asr directly to store its "per thread" resolver (priv = 
> > > > > &_asr
> > > > > in the macro invocation above and thus *priv = _asr_resolver() ends up
> > > > > storing the resolver in a global variable).
> > > > > 
> > > > > Later, threads are started. The next time that this code runs,
> > > > > _thread_cb.tc_tag_storage points to _thread_tag_storage() from
> > > > > lib/libc/thread/rthread_libc.c. This function uses memcpy() to
> > > > > initialise the per-thread variant of the variable with the contents of
> > > > > the global variant. Thus, all threads now end up using the resolver
> > > > > instance that the main thread created.
> > > > > 
> > > > > I did not investigate what exactly goes wrong later, but since 
> > > > > multiple
> > > > > threads are now sharing a resolver, nothing good can happen.
> > > > > 
> > > > > The actual crash is a use-after-free (and it is always the same one):
> > > > > iter_family() in lib/libc/asr/getaddrinfo_async.c executes
> > > > > AS_FAMILY(as). This is:
> > > > > 
> > > > >  #define AS_FAMILY(p) ((p)->as_ctx->ac_family[(p)->as_family_idx])
> > > > > 
> > > > > (gdb) print as->as_ctx->ac_family[as->as_family_idx]
> > > > > Cannot access memory at address 0x28b3653e000
> > > > > 
> > > > > The struct as->as_ctx was overwritten with 0xdf. Here is a random 
> > > > > struct
> > > > > member as an example:
> > > > > 
> > > > > (gdb) print as->as_ctx[0]->ac_domain
> > > > > $8 = 0xdfdfdfdfdfdfdfdf <error: Cannot access memory at address
> > > > > 0xdfdfdfdfdfdfdfdf>
> > > > > 
> > > > > At this point, as->as_ctx == _asr->a_ctx, but I am not sure how much
> > > > > this helps / surprises, since the crash happens in the main thread...
> > > > > I guess one of the threads freed the main thread's resolver in
> > > > > _asr_resolver_done()..?
> > > > > 
> > > > > I guess that _asr_resolver_done() is supposed to set _asr back to a 
> > > > > NULL
> > > > > pointer. However, a breakpoint on pthread_create() says that it is not
> > > > > doing so. In fact, I cannot find any callers of 
> > > > > _asr_resolver_done()...?
> > > > > Perhaps some code was changed to directly call _asr_ctx_unref() 
> > > > > instead?
> > > > > At least the code looks like asr_run() does _asr_async_free() at the 
> > > > > end
> > > > > and this just calls _asr_ctx_unref() directly, even though the context
> > > > > was originally aquired by calling getaddrinfo_async() with last
> > > > > parameter NULL, which it passed on to _asr_use_resolver(NULL) and thus
> > > > > acquiring the thread-local context.
> > > > > 
> > > > > Cheers,
> > > > > Uli
> > > > > -- 
> > > > > This can be a, a little complicated. Listen, my advice is... ask
> > > > > somebody else for advice, at least someone who's... got more 
> > > > > experience
> > > > > at...  giving advice.
> > > > > 
> > > > 
> > > > Thanks for this analysis, it's consistent with my observations I posted
> > > > earlier.
> > > > 
> > > >         -Otto
> > > > 
> > > 
> > > This works for me, but only lighlty tested. If this is an acceptable
> > > approach, some of the _rthread_init() machinery could likely be
> > > simplified.
> > > 
> > > There might be pitfalls, though.
> > 
> > Yes, I don't think that is a good idea.
> > 
> > The problem isn't really the late initialization, but rather that the
> > ASR code doesn't use the _THREAD_PRIVATE() construction properly.
> > 
> > We could work around that issue in the ASR code by doing something like:
> > 
> >         priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> >         if (priv != &_asr && *priv == _asr)
> >                 *priv = NULL;
> >         if (*priv == NULL) {
> >                 DPRINT("setting up thread-local resolver\n");
> >                 *priv = _asr_resolver();
> >         }
> > 
> > But maybe a better solution is to change _thread_tag_storage() to only
> > do the memcpy() if we're in the main thread and do a memset() if not.
> 
> That would mean a change in the contract of _thread_tag_storage(). Now
> it is an internal function to libc luckily, so we are able to check if
> any caller depends on the current semantics.
> 
>       -Otto
> 

The diff below attempt to follow the suggestion, but the test program
still crashes. I must be overlookign something but I think I won't be
able to spend much time on this the coming days. 

        -Otto

Index: rthread_libc.c
===================================================================
RCS file: /cvs/src/lib/libc/thread/rthread_libc.c,v
retrieving revision 1.3
diff -u -p -r1.3 rthread_libc.c
--- rthread_libc.c      10 Jan 2019 18:45:33 -0000      1.3
+++ rthread_libc.c      24 Dec 2020 14:46:20 -0000
@@ -5,6 +5,7 @@
 #include <pthread.h>
 #include <stdlib.h>
 #include <string.h>
+#include <tib.h>
 
 #include "rthread.h"
 #include "rthread_cb.h"
@@ -103,13 +104,16 @@ _thread_tag_storage(void **tag, void *st
 
        ret = pthread_getspecific(tt->k);
        if (ret == NULL) {
-               ret = malloc(sz);
+               ret = calloc(1, sz);
                if (ret == NULL)
                        ret = err;
                else {
-                       if (pthread_setspecific(tt->k, ret) == 0)
-                               memcpy(ret, storage, sz);
-                       else {
+                       if (pthread_setspecific(tt->k, ret) == 0) {
+                               if (!__isthreaded ||
+                                   (TIB_GET()->tib_thread_flags &
+                                   TIB_THREAD_INITIAL_STACK))
+                                       memcpy(ret, storage, sz);
+                       } else {
                                free(ret);
                                ret = err;
                        }

Reply via email to