> Le 20 juin 2023 à 23:37, Mark Wielaard <m...@klomp.org> a écrit :
> 
> Hi,
> 
> On Mon, Jun 19, 2023 at 05:08:50PM +0200, Mark Wielaard wrote:
> 
> So I made a mistake here. Since I was testing on fedora 38 which has
> DEBUGINFOD_URLS set. Without DEBUGINFOD_URLS set there is no big
> slowdown.
> 
> Do you have the DEBUGINFOD_URLS environment variable set?
> 
> The real sd-coredump will not have DEBUGINFOD_URLS set (I hope).
> 
> Thanks,
> 
> Mark

Hi,

Our real use case happens on a Openshift 4.13 node, so the OS is Red Hat Core 
OS 9 (which I assume shares a lot of foundations with RHEL 9).

On our side Francois also told me this afternoon that he didn’t really 
reproduce the same thing with my reproducer posted here and the real 
systemd-coredump issue he witnessed live, and also noticed that with 
DEBUGINFOD_URLS unset/set to the empty string my reproducer has no problem 
anymore. What he witnessed on the real case (using perf/gdb) was that 
apparently lots of time was spend in elf_getdata_rawchunk and often in this 
kind of stack:

Samples: 65K of event 'cpu-clock:pppH', Event count (approx.): 16468500000      
                                                                                
                                           
Overhead  Command         Shared Object             Symbol                      
                                                                                
                                           
  98.24%  (sd-parse-elf)  libelf-0.188.so           [.] elf_getdata_rawchunk
   0.48%  (sd-parse-elf)  libelf-0.188.so           [.] 0x00000000000048a3
   0.27%  (sd-parse-elf)  libelf-0.188.so           [.] gelf_getphdr
   0.11%  (sd-parse-elf)  libc.so.6                 [.] _int_malloc
   0.10%  (sd-parse-elf)  libelf-0.188.so           [.] gelf_getnote
   0.06%  (sd-parse-elf)  libc.so.6                 [.] __libc_calloc
   0.05%  (sd-parse-elf)  [kernel.kallsyms]         [k] 
__softirqentry_text_start
   0.05%  (sd-parse-elf)  libc.so.6                 [.] _int_free


(gdb) bt
#0  0x00007f0ba8a88194 in elf_getdata_rawchunk () from target:/lib64/libelf.so.1
#1  0x00007f0ba98e5013 in module_callback.lto_priv () from 
target:/usr/lib64/systemd/libsystemd-shared-252.so
#2  0x00007f0ba8ae7291 in dwfl_getmodules () from target:/lib64/libdw.so.1
#3  0x00007f0ba98e6dc0 in parse_elf_object () from 
target:/usr/lib64/systemd/libsystemd-shared-252.so
#4  0x0000562c474f2d5e in submit_coredump ()
#5  0x0000562c474f57d1 in process_socket.constprop ()
#6  0x0000562c474efbf8 in main ()

My reproducer actually doesn’t fully re-implement what systemd implements (the 
parsing of the package metadata is clearly omitted), so I thought I had 
reproduced the same problem while apparently I didn’t, sorry for that. We will 
also have to double check if really just using 2000 dummy libraries is enough 
or if this also needs to have a more complex binary like we have in our real 
case.

Tomorrow on our side we will have to play a bit with a local build of 
systemd-coredump and try to run it manually to better understand what’s going 
wrong.


Note: when I wrote and tested my reproducer, I used a fedora:38 container, 
which doesn’t have DEBUGINFOD_URLS set (which may be different from a real 
fedora 38, non containerized)

[root@7563ccfb7a39 /]# printenv|grep DEBUGINFOD_URLS
[root@7563ccfb7a39 /]# find /etc/profile.d/|grep debug
[root@7563ccfb7a39 /]# cat /etc/os-release
NAME="Fedora Linux"
VERSION="38 (Container Image)"

Cheers,
Romain

Reply via email to