Hello, Faidon Liambotis, le lun. 11 janv. 2021 01:33:34 +0200, a ecrit: > I'm still unsure why either test fail the way they do¹. Like the last > time I was debugging this bug... gdb'ing the aligned-alloc test doesn't > work (can't interrupt execution). What's worse is that even running "ps" > in a different console hangs while the test is running.
That's because the process is hung hard, see https://www.gnu.org/software/hurd/faq/ps_hangs.html Using the -M option gets less information but doesn't hang. > So something seems weird with the system, that's not jemalloc > related... It is :) Attaching with gdb from outside, I get: #0 0x0111e69c in mach_msg_trap () at ./build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2 #1 0x0111ee46 in __GI___mach_msg (msg=0x103281c, option=3, send_size=64, rcv_size=32, rcv_name=51, timeout=0, notify=0) at msg.c:111 #2 0x01577612 in __gsync_wait (task=1, addr=17642300, val1=2, val2=0, msec=0, flags=0) at ./build-tree/hurd-i386-libc/mach/RPC_gsync_wait.c:175 #3 0x010f1923 in __pthread_mutex_lock (mtxp=0x10d333c <init_lock+60>) at ../sysdeps/mach/hurd/htl/pt-mutex-lock.c:36 #4 0x01086308 in malloc_mutex_lock_final (mutex=0x10d3300 <init_lock>) at include/jemalloc/internal/mutex.h:155 #5 je_malloc_mutex_lock_slow (mutex=0x10d3300 <init_lock>) at src/mutex.c:85 #6 0x0103f7bc in malloc_mutex_lock (mutex=0x10d3300 <init_lock>, tsdn=0x0) at include/jemalloc/internal/mutex.h:221 #7 malloc_init_hard () at src/jemalloc.c:1740 #8 0x01041d65 in malloc_init () at src/jemalloc.c:210 #9 imalloc_init_check (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2230 #10 imalloc (dopts=<optimized out>, sopts=<optimized out>) at src/jemalloc.c:2261 #11 je_malloc_default (size=100) at src/jemalloc.c:2290 #12 0x010423a2 in malloc (size=<optimized out>) at src/jemalloc.c:2389 #13 0x011af9a5 in __vasprintf_internal (result_ptr=0x1032b24, format=0x12d39a4 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", args=0x1032ae8 "\357\352,\001\357\352,\001'f\017\001\034", mode_flags=0) at vasprintf.c:45 #14 0x0118c367 in ___asprintf (string_ptr=0x1032b24, format=0x12d39a4 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n") at asprintf.c:31 #15 0x0116302a in __assert_fail_base (fmt=0x12d39a4 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x10f6631 "self != NULL", file=0x10f6627 "pt-self.c", line=28, function=0x10f6640 <__PRETTY_FUNCTION__.1> "__pthread_self") at assert.c:57 #16 0x01163129 in __GI___assert_fail (assertion=0x10f6631 "self != NULL", file=0x10f6627 "pt-self.c", line=28, function=0x10f6640 <__PRETTY_FUNCTION__.1> "__pthread_self") at assert.c:101 #17 0x010f12cf in __pthread_self () at pt-self.c:28 #18 __pthread_self () at pt-self.c:25 #19 0x0103f58d in malloc_init_hard_needed () at src/jemalloc.c:1455 #20 malloc_init_hard () at src/jemalloc.c:1746 #21 0x01041d65 in malloc_init () at src/jemalloc.c:210 #22 imalloc_init_check (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2230 #23 imalloc (dopts=<optimized out>, sopts=<optimized out>) at src/jemalloc.c:2261 #24 je_malloc_default (size=708) at src/jemalloc.c:2290 #25 0x010423a2 in malloc (size=<optimized out>) at src/jemalloc.c:2389 #26 0x010f054d in __pthread_alloc (pthread=0x1032cb0) at pt-alloc.c:125 #27 0x010f0884 in __pthread_create_internal (thread=0x1032cf8, attr=0x0, start_routine=0x0, arg=0x0) at pt-create.c:99 #28 0x010f4a3b in _init_routine (stack=0x0) at ../sysdeps/mach/hurd/htl/pt-sysdep.c:73 #29 0x01154c14 in init (data=0x1032d60) at ../sysdeps/mach/hurd/i386/init-first.c:209 #30 _dl_init_first (argc=<optimized out>) at ../sysdeps/mach/hurd/i386/init-first.c:325 #31 0x0000220d in _dl_start_user () from /lib/ld.so So basically libpthread is trying to initialize itself, calls malloc, which initializes jemalloc, which calls pthread_self, which is not happy that libpthread is not initialized yet, thus calls assert, which tries to malloc as well, which tries (again!) to initialize jemalloc, and gets stuck on mutex_lock. And since this is all happening at very early initialization of libc, interaction with ps etc. is not possible yet. I tried to make __pthread_alloc avoid using malloc, but then I got instead #24 je_malloc_default (size=4348) at src/jemalloc.c:2290 #25 0x010423a2 in malloc (size=<optimized out>) at src/jemalloc.c:2389 #26 0x00013b08 in _dl_allocate_tls_storage () at dl-tls.c:403 #27 0x00013e65 in _dl_allocate_tls (mem=0x0) at dl-tls.c:588 #28 0x010e1a0e in __pthread_create_internal (thread=0x1032cf8, attr=0x0, start_routine=0x0, arg=0x0) at pt-create.c:151 #29 0x010e5b1b in _init_routine (stack=0x0) at ../sysdeps/mach/hurd/htl/pt-sysdep.c:73 #30 0x01154c14 in init (data=0x1032d60) at ../sysdeps/mach/hurd/i386/init-first.c:209 #31 _dl_init_first (argc=<optimized out>) at ../sysdeps/mach/hurd/i386/init-first.c:325 #32 0x0000220d in _dl_start_user () from /lib/ld.so Thus the same issue, and changing _dl_allocate_tls is a way more involved thing. I tried another approach by making pthread_self() return the id of the initial thread withouth checks, but then I get a crash on #0 __pthread_mutex_lock (mtxp=0x10ed1a0 <__pthread_key_lock>) at ../sysdeps/mach/hurd/htl/pt-mutex-lock.c:41 #1 0x010e0e59 in __GI___pthread_key_create (key=0x10db944 <je_tsd_tsd>, destructor=0x10aa4e0 <je_tsd_cleanup>) at ../sysdeps/htl/pt-key-create.c:41 #2 0x010aa85d in tsd_boot0 () at include/jemalloc/internal/tsd_tls.h:15 #3 je_malloc_tsd_boot0 () at src/tsd.c:426 #4 0x0103f5cb in malloc_init_hard () at src/jemalloc.c:1757 #5 malloc_init_hard () at src/jemalloc.c:1734 #6 0x01041d65 in malloc_init () at src/jemalloc.c:210 #7 imalloc_init_check (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2230 #8 imalloc (dopts=<optimized out>, sopts=<optimized out>) at src/jemalloc.c:2261 #9 je_malloc_default (size=708) at src/jemalloc.c:2290 #10 0x010e154d in __pthread_alloc (pthread=0x1032cb0) at pt-alloc.c:147 #11 0x010e1884 in __pthread_create_internal (thread=0x1032cf8, attr=0x0, start_routine=0x0, arg=0x0) at pt-create.c:99 #12 0x010e5a5b in _init_routine (stack=0x0) at ../sysdeps/mach/hurd/htl/pt-sysdep.c:73 #13 0x01154c14 in init (data=0x1032d60) at ../sysdeps/mach/hurd/i386/init-first.c:209 #14 _dl_init_first (argc=<optimized out>) at ../sysdeps/mach/hurd/i386/init-first.c:325 #15 0x0000220d in _dl_start_user () from /lib/ld.so The pthread_key implementation uses a recursive mutex, which tries to use TLS to get per-thread state, which cannot work since libpthread is not finished initializing. I'm wondering how this kind of bootstrap issue is solved on Linux? The _dl_allocate_tls code is exactly the same. Samuel