Hello,

Faidon Liambotis, le lun. 11 janv. 2021 01:33:34 +0200, a ecrit:
> I'm still unsure why either test fail the way they do¹. Like the last
> time I was debugging this bug... gdb'ing the aligned-alloc test doesn't
> work (can't interrupt execution). What's worse is that even running "ps"
> in a different console hangs while the test is running.

That's because the process is hung hard, see 
https://www.gnu.org/software/hurd/faq/ps_hangs.html
Using the -M option gets less information but doesn't hang.

> So something seems weird with the system, that's not jemalloc
> related...

It is :)

Attaching with gdb from outside, I get:

#0  0x0111e69c in mach_msg_trap () at 
./build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2
#1  0x0111ee46 in __GI___mach_msg (msg=0x103281c, option=3, send_size=64, 
rcv_size=32, rcv_name=51, timeout=0, notify=0) at msg.c:111
#2  0x01577612 in __gsync_wait (task=1, addr=17642300, val1=2, val2=0, msec=0, 
flags=0) at ./build-tree/hurd-i386-libc/mach/RPC_gsync_wait.c:175
#3  0x010f1923 in __pthread_mutex_lock (mtxp=0x10d333c <init_lock+60>) at 
../sysdeps/mach/hurd/htl/pt-mutex-lock.c:36
#4  0x01086308 in malloc_mutex_lock_final (mutex=0x10d3300 <init_lock>) at 
include/jemalloc/internal/mutex.h:155
#5  je_malloc_mutex_lock_slow (mutex=0x10d3300 <init_lock>) at src/mutex.c:85
#6  0x0103f7bc in malloc_mutex_lock (mutex=0x10d3300 <init_lock>, tsdn=0x0) at 
include/jemalloc/internal/mutex.h:221
#7  malloc_init_hard () at src/jemalloc.c:1740
#8  0x01041d65 in malloc_init () at src/jemalloc.c:210
#9  imalloc_init_check (dopts=<synthetic pointer>, sopts=<synthetic pointer>) 
at src/jemalloc.c:2230
#10 imalloc (dopts=<optimized out>, sopts=<optimized out>) at 
src/jemalloc.c:2261
#11 je_malloc_default (size=100) at src/jemalloc.c:2290
#12 0x010423a2 in malloc (size=<optimized out>) at src/jemalloc.c:2389
#13 0x011af9a5 in __vasprintf_internal (result_ptr=0x1032b24, format=0x12d39a4 
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    args=0x1032ae8 "\357\352,\001\357\352,\001'f\017\001\034", mode_flags=0) at 
vasprintf.c:45
#14 0x0118c367 in ___asprintf (string_ptr=0x1032b24, format=0x12d39a4 
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n") at asprintf.c:31
#15 0x0116302a in __assert_fail_base (fmt=0x12d39a4 "%s%s%s:%u: %s%sAssertion 
`%s' failed.\n%n", assertion=0x10f6631 "self != NULL", file=0x10f6627 
"pt-self.c",
    line=28, function=0x10f6640 <__PRETTY_FUNCTION__.1> "__pthread_self") at 
assert.c:57
#16 0x01163129 in __GI___assert_fail (assertion=0x10f6631 "self != NULL", 
file=0x10f6627 "pt-self.c", line=28,
    function=0x10f6640 <__PRETTY_FUNCTION__.1> "__pthread_self") at assert.c:101
#17 0x010f12cf in __pthread_self () at pt-self.c:28
#18 __pthread_self () at pt-self.c:25
#19 0x0103f58d in malloc_init_hard_needed () at src/jemalloc.c:1455
#20 malloc_init_hard () at src/jemalloc.c:1746
#21 0x01041d65 in malloc_init () at src/jemalloc.c:210
#22 imalloc_init_check (dopts=<synthetic pointer>, sopts=<synthetic pointer>) 
at src/jemalloc.c:2230
#23 imalloc (dopts=<optimized out>, sopts=<optimized out>) at 
src/jemalloc.c:2261
#24 je_malloc_default (size=708) at src/jemalloc.c:2290
#25 0x010423a2 in malloc (size=<optimized out>) at src/jemalloc.c:2389
#26 0x010f054d in __pthread_alloc (pthread=0x1032cb0) at pt-alloc.c:125
#27 0x010f0884 in __pthread_create_internal (thread=0x1032cf8, attr=0x0, 
start_routine=0x0, arg=0x0) at pt-create.c:99
#28 0x010f4a3b in _init_routine (stack=0x0) at 
../sysdeps/mach/hurd/htl/pt-sysdep.c:73
#29 0x01154c14 in init (data=0x1032d60) at 
../sysdeps/mach/hurd/i386/init-first.c:209
#30 _dl_init_first (argc=<optimized out>) at 
../sysdeps/mach/hurd/i386/init-first.c:325
#31 0x0000220d in _dl_start_user () from /lib/ld.so

So basically libpthread is trying to initialize itself, calls malloc,
which initializes jemalloc, which calls pthread_self, which is not happy
that libpthread is not initialized yet, thus calls assert, which tries
to malloc as well, which tries (again!) to initialize jemalloc, and
gets stuck on mutex_lock. And since this is all happening at very early
initialization of libc, interaction with ps etc. is not possible yet.

I tried to make __pthread_alloc avoid using malloc, but then I got
instead

#24 je_malloc_default (size=4348) at src/jemalloc.c:2290
#25 0x010423a2 in malloc (size=<optimized out>) at src/jemalloc.c:2389
#26 0x00013b08 in _dl_allocate_tls_storage () at dl-tls.c:403
#27 0x00013e65 in _dl_allocate_tls (mem=0x0) at dl-tls.c:588
#28 0x010e1a0e in __pthread_create_internal (thread=0x1032cf8, attr=0x0, 
start_routine=0x0, arg=0x0) at pt-create.c:151
#29 0x010e5b1b in _init_routine (stack=0x0) at 
../sysdeps/mach/hurd/htl/pt-sysdep.c:73
#30 0x01154c14 in init (data=0x1032d60) at 
../sysdeps/mach/hurd/i386/init-first.c:209
#31 _dl_init_first (argc=<optimized out>) at 
../sysdeps/mach/hurd/i386/init-first.c:325
#32 0x0000220d in _dl_start_user () from /lib/ld.so

Thus the same issue, and changing _dl_allocate_tls is a way more
involved thing. I tried another approach by making pthread_self() return
the id of the initial thread withouth checks, but then I get a crash on

#0  __pthread_mutex_lock (mtxp=0x10ed1a0 <__pthread_key_lock>) at 
../sysdeps/mach/hurd/htl/pt-mutex-lock.c:41
#1  0x010e0e59 in __GI___pthread_key_create (key=0x10db944 <je_tsd_tsd>, 
destructor=0x10aa4e0 <je_tsd_cleanup>)
    at ../sysdeps/htl/pt-key-create.c:41
#2  0x010aa85d in tsd_boot0 () at include/jemalloc/internal/tsd_tls.h:15
#3  je_malloc_tsd_boot0 () at src/tsd.c:426
#4  0x0103f5cb in malloc_init_hard () at src/jemalloc.c:1757
#5  malloc_init_hard () at src/jemalloc.c:1734
#6  0x01041d65 in malloc_init () at src/jemalloc.c:210
#7  imalloc_init_check (dopts=<synthetic pointer>, sopts=<synthetic pointer>) 
at src/jemalloc.c:2230
#8  imalloc (dopts=<optimized out>, sopts=<optimized out>) at 
src/jemalloc.c:2261
#9  je_malloc_default (size=708) at src/jemalloc.c:2290
#10 0x010e154d in __pthread_alloc (pthread=0x1032cb0) at pt-alloc.c:147
#11 0x010e1884 in __pthread_create_internal (thread=0x1032cf8, attr=0x0, 
start_routine=0x0, arg=0x0) at pt-create.c:99
#12 0x010e5a5b in _init_routine (stack=0x0) at 
../sysdeps/mach/hurd/htl/pt-sysdep.c:73
#13 0x01154c14 in init (data=0x1032d60) at 
../sysdeps/mach/hurd/i386/init-first.c:209
#14 _dl_init_first (argc=<optimized out>) at 
../sysdeps/mach/hurd/i386/init-first.c:325
#15 0x0000220d in _dl_start_user () from /lib/ld.so

The pthread_key implementation uses a recursive mutex, which tries to
use TLS to get per-thread state, which cannot work since libpthread is
not finished initializing.

I'm wondering how this kind of bootstrap issue is solved on Linux? The
_dl_allocate_tls code is exactly the same.

Samuel

Reply via email to