Hello, (I've been following this with Julien as I can reproduce the behaviour on my nixos system -- you don't have to run the latest systemd, just install the derivation and use its path in LD_LIBRARY_PATH instead of the system's... That also probably could bring its own set of incompatibility but so far I'm getting the same behaviour as him running systemd properly)
So: - I'm with Andreas on this, valgrind would detect invalid accesses --- except that if you stack both bash malloc and valgrind, then valgrind will consider the slightly bigger buffer allocated by bash, so the underlying overflow will not be detected when using the internal malloc. OTOH, valgrind *should* complain when using the system malloc (configure --without-bash-malloc), and it does not, so for me that means there really is some weird thing happening. Forgive me for trusting valgrind analysis more than bash malloc debugging here... - I could reproduce the same as Julien, with -DDISABLE_MALLOC_WRAPPERS the crash still happens when bash is run directly but nothing complains in valgrind. This could mean that systemd is overflowing bash malloc safeguards as you pointed out (I just don't understand why it wouldn't overflow with internal malloc), but it could also mean that the memory has been allocated somewhere else (e.g. libc's malloc) and freed by bash malloc. nss systemd has started using reallocarray() since v247 and that is not tracked by bash, I would think that's a good candidate? I don't have time right now, but I think adding an implementation for reallocarray (wrapper around realloc which does exist) would be the next thing to do. - Unrelated to the systemd bug, valgrind really seems thrown off by wrappers.. I also get some invalid reads in the bash malloc code: $ valgrind /bash --norc -c true -> invalid free in evalstring, fixed by Andreas' patch $ valgrind /bash --norc -c 'echo ~' -> this aborts even without systemd in the library path for me, but only with valgrind -------- ==407545== Memcheck, a memory error detector ==407545== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==407545== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info ==407545== Command: /bash --norc -c echo\ ~ ==407545== ==407545== Invalid read of size 1 ==407545== at 0x4E5DEC: internal_free.constprop.0 (malloc.c:957) ==407545== by 0x46B165: expand_word_internal (subst.c:10637) ==407545== by 0x46FCDD: shell_expand_word_list.constprop.0 (subst.c:11865) ==407545== by 0x47057E: expand_word_list_internal (subst.c:11989) ==407545== by 0x47057E: expand_words (subst.c:11345) ==407545== by 0x44199E: execute_simple_command (execute_cmd.c:4377) ==407545== by 0x44199E: execute_command_internal (execute_cmd.c:846) ==407545== by 0x498E38: parse_and_execute (evalstring.c:489) ==407545== by 0x4280AC: run_one_command.isra.0 (shell.c:1440) ==407545== by 0x426AA1: main (shell.c:741) ==407545== Address 0x4a98090 is 16 bytes before a block of size 17 alloc'd ==407545== at 0x483E751: malloc (in /nix/store/fazpzv26bal3z6j0mvi8y3k54x3xxi81-valgrind-3.16.1/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==407545== by 0x4912AB: xmalloc (xmalloc.c:114) ==407545== by 0x4E4ACD: tilde_expand (tilde.c:196) ==407545== by 0x437770: bash_tilde_expand (general.c:1211) ==407545== by 0x46A424: expand_word_internal (subst.c:10285) ==407545== by 0x46FCDD: shell_expand_word_list.constprop.0 (subst.c:11865) ==407545== by 0x47057E: expand_word_list_internal (subst.c:11989) ==407545== by 0x47057E: expand_words (subst.c:11345) ==407545== by 0x44199E: execute_simple_command (execute_cmd.c:4377) ==407545== by 0x44199E: execute_command_internal (execute_cmd.c:846) ==407545== by 0x498E38: parse_and_execute (evalstring.c:489) ==407545== by 0x4280AC: run_one_command.isra.0 (shell.c:1440) ==407545== by 0x426AA1: main (shell.c:741) ==407545== ==407545== Invalid read of size 1 ==407545== at 0x4E5E05: internal_free.constprop.0 (malloc.c:968) ==407545== by 0x46B165: expand_word_internal (subst.c:10637) ==407545== by 0x46FCDD: shell_expand_word_list.constprop.0 (subst.c:11865) ==407545== by 0x47057E: expand_word_list_internal (subst.c:11989) ==407545== by 0x47057E: expand_words (subst.c:11345) ==407545== by 0x44199E: execute_simple_command (execute_cmd.c:4377) ==407545== by 0x44199E: execute_command_internal (execute_cmd.c:846) ==407545== by 0x498E38: parse_and_execute (evalstring.c:489) ==407545== by 0x4280AC: run_one_command.isra.0 (shell.c:1440) ==407545== by 0x426AA1: main (shell.c:741) ==407545== Address 0x4a98090 is 16 bytes before a block of size 17 alloc'd ==407545== at 0x483E751: malloc (in /nix/store/fazpzv26bal3z6j0mvi8y3k54x3xxi81-valgrind-3.16.1/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==407545== by 0x4912AB: xmalloc (xmalloc.c:114) ==407545== by 0x4E4ACD: tilde_expand (tilde.c:196) ==407545== by 0x437770: bash_tilde_expand (general.c:1211) ==407545== by 0x46A424: expand_word_internal (subst.c:10285) ==407545== by 0x46FCDD: shell_expand_word_list.constprop.0 (subst.c:11865) ==407545== by 0x47057E: expand_word_list_internal (subst.c:11989) ==407545== by 0x47057E: expand_words (subst.c:11345) ==407545== by 0x44199E: execute_simple_command (execute_cmd.c:4377) ==407545== by 0x44199E: execute_command_internal (execute_cmd.c:846) ==407545== by 0x498E38: parse_and_execute (evalstring.c:489) ==407545== by 0x4280AC: run_one_command.isra.0 (shell.c:1440) ==407545== by 0x426AA1: main (shell.c:741) ==407545== malloc: subst.c:10637: assertion botched free: called with unallocated block argument Aborting...==407545== ==407545== Process terminating with default action of signal 6 (SIGABRT): dumping core ==407545== at 0x48FEBAA: raise (in /nix/store/j5p0j1w27aqdzncpw73k95byvhh5prw2-glibc-2.33-47/lib/libc-2.33.so) ==407545== by 0x48E9522: abort (in /nix/store/j5p0j1w27aqdzncpw73k95byvhh5prw2-glibc-2.33-47/lib/libc-2.33.so) ==407545== by 0x44EACC: programming_error (error.c:175) ==407545== by 0x4E5E38: internal_free.constprop.0 (malloc.c:974) ==407545== by 0x46B165: expand_word_internal (subst.c:10637) ==407545== by 0x46FCDD: shell_expand_word_list.constprop.0 (subst.c:11865) ==407545== by 0x47057E: expand_word_list_internal (subst.c:11989) ==407545== by 0x47057E: expand_words (subst.c:11345) ==407545== by 0x44199E: execute_simple_command (execute_cmd.c:4377) ==407545== by 0x44199E: execute_command_internal (execute_cmd.c:846) ==407545== by 0x498E38: parse_and_execute (evalstring.c:489) ==407545== by 0x4280AC: run_one_command.isra.0 (shell.c:1440) ==407545== by 0x426AA1: main (shell.c:741) ==407545== ==407545== HEAP SUMMARY: ==407545== in use at exit: 17 bytes in 1 blocks ==407545== total heap usage: 211 allocs, 210 frees, 42,402 bytes allocated ==407545== ==407545== LEAK SUMMARY: ==407545== definitely lost: 0 bytes in 0 blocks ==407545== indirectly lost: 0 bytes in 0 blocks ==407545== possibly lost: 0 bytes in 0 blocks ==407545== still reachable: 17 bytes in 1 blocks ==407545== suppressed: 0 bytes in 0 blocks ==407545== Rerun with --leak-check=full to see details of leaked memory ==407545== ==407545== For lists of detected and suppressed errors, rerun with: -s ==407545== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0) Aborted (core dumped) ----------------------- (and with -DDISABLE_MALLOC_WRAPPERS valgrind is quiet but no valgrind crashes, only with systemd 249 libs in search path) I guess we can leave this part off, and just say valgrind is not a tool that can be used with bash malloc because they conflict with each other, and call it a day. Cheers, -- Dominique