Hi Romain, Just to let you know I am looking at this. But haven't made much progress in understanding it yet. Thanks so much for the reproducer. I have been able to see the (very slow) parsing of the core file with it.
$ time ./mimic-systemd-coredump [...] real 3m35.965s user 0m0.722s sys 0m0.345s Note however that a lot of time is "missing". And in fact running it again is fast!?! $ time ./mimic-systemd-coredump real 0m0.327s user 0m0.272s sys 0m0.050s This is because of the kernel inode/dentry cache. If I do $ echo 2 | sudo tee /proc/sys/vm/drop_caches before running ./mimic-systemd-coredump it is always slow. I'll try to figure out what we do to make it so hard for the kernel to do these lookups. But that doesn't invalidate the other observation you made, that the dwfl_module_get_elf call always returns NULL. > My understanding of the will of systemd developers is that they hoped that > libdwfl would > return some "partial" Elf* reference when calling dwfl_module_getelf, from > the elf > headers found in the core for each and every shared library (the first page > of the > PT_LOAD mappings that the kernel always dumps even when the mapping is file > backed). Right, that is a reasonable hope. And I don't actually know why it always fails in this case. > However it seems that behind the hood it doesn't (is it linked to > core_file_read_eagerly > which seems to always return false in this case ?), and instead it uses the > .find_elf = dwfl_build_id_find_elf callback which tries to find the file by > buildid > on the filesystem. For some unknown reason to me, calling dwfl_module_getelf > is very > slow (I wouldn't expect that looking on the filesytem by buildid is that slow > actually). Apparently we do it in some really slow way if the inodes/dentries aren't in kernel cache (and the files are not actually on disk). Which does bring up the question why systemd-coredump isn't running in the same mount space as the crashing program. Then it would simply find the files that the crashing program is using. Or it might install a .find_elf callback that (also) looks under /proc/pid/root/ ? > So, is this behavior of dwfl_module_getelf expected ? If yes, do you agree > that we shall > advise systemd-coredump developer to invert their logic, to first try to look > for partial > elf header from the core's PT_LOAD section, then only fallback to the more > reliable > dwfl_module_getelf if it didn't work ? In practice, we have tried the > following patch > applied to systemd v253 and it seems ot "fix" the above mentionned case: I don't think dwfl_module_getelf should always return NULL in this case. Nor should it be this slow. But given that it does and given that it is slow that is certainly reasonable advise. > Some other side question: on the long run, wouldn't it make sense that > elfutils tries to parse the > json package metadata section by itself, just like it does for the buildid, > rather than implementing > this logic in systemd ? Maybe we could provide this functionality. You are right that we have no problem getting the build-ids with $ eu-unstrip --core=./the-core -n So providing some other "static data" might be possible with a simpler interface. Thanks for this extensive bug report and reproducer. I play some more with it to hopefully get you some real answers/fixes. Cheers, Mark