Hi Aaron, On Tue, 2024-04-09 at 23:45 -0400, Aaron Merey wrote: > dwarf_getsrcfiles causes line data to be read in addition to file data. > This is wasteful for programs which only need file or directory names. > Debuginfod server is one such example. > > Fix this by moving the srcfile reading in read_srclines into a separate > function read_srcfiles. This change improves debuginfod server's max > resident set size by up to 75% during rpm indexing. > > * libdw/dwarf_getsrcfiles.c (dwarf_getsrcfiles): Replace > dwarf_getsrclines and __libdw_getsrclines with > __libdw_getsrcfiles. > * libdw/dwarf_getsrclines.c (read_line_header): New function. > (read_srcfiles): New function. > (read_srclines): Move srcfile reading into read_srcfiles. > Add parameter to use cached srcfiles if available. > Also merge srcfiles with any files from DW_LNE_define_file. > (__libdw_getsrclines): Changed to call get_lines_or_files. > (__libdw_getsrcfiles): New function. Calls get_lines_or_files. > (get_lines_or_files): New function based on the old > __libdw_getsrclines. Call read_srcfiles if linesp is NULL, > otherwise call read_srclines. Pass previously read srcfiles > to read_srclines if available. > * libdw/dwarf_macro_getsrcfiles.c (dwarf_macro_getsrcfiles): > Replace __libdw_getsrclines with __libdw_getsrcfiles. > * libdw/libdwP.h (__libdw_getsrcfiles): New declaration. > * tests/.gitignore: Add new test binary. > * tests/get-files.c: Verify that dwarf_getsrcfiles does > not cause srclines to be read. > * tests/get-lines.c: Verify that srclines can be read > after srcfiles have been read. > * tests/Makefile.am: Add new testfiles. > * tests/get-files-define-file.c: Print file names before > and after reading DW_LNE_define_file. > * tests/run-get-files.sh: Add get-files-define-file test. > * tests/testfile-define-file.bz2: New testfile. Copy of > testfile36.debug but with a line program consisting of two > DW_LNE_define_file opcodes. > > https://sourceware.org/bugzilla/show_bug.cgi?id=27405 > > Signed-off-by: Aaron Merey <ame...@redhat.com> > --- > > v2 changes: > Restored support for DW_LNE_define_file.
Great. And sorry I first suggested to just drop it and then said I would like it back. This was more work than I though. > Added DW_LNE_define_file testcase and testfile. Nice. How did you edit this file? > Added new function get_lines_or_files which is based on the old > __libdw_getsrclines. __libdw_getsrclines and __libdw_getsrcfiles now > call get_lines_or_files. It is just a simple rename, but so much nicer to my eyes. Thanks. > Mentioned max resident set size improvement in the commit message. I > did not mention a 5% speed up to rpm indexing that I have previously > talked about since I could not reliably reproduce it with v2 of the patch. > I suspect that speed up may have been due to the smaller max RSS reducing > swapping during that round of testing. Makes sense. And a reduction of 75% of the max resident size it pretty huge in itself. > + case DW_LNE_define_file: > + { > + char *fname = (char *) linep; > + uint8_t *endp = memchr (linep, '\0', lineendp - linep); > + if (endp == NULL) > + goto invalid_data; > + size_t fnamelen = endp - linep; > + linep = endp + 1; > + > + unsigned int diridx; > + if (unlikely (linep >= lineendp)) > + goto invalid_data; > + get_uleb128 (diridx, linep, lineendp); > + > + size_t ndirs = (*filesp)->ndirs; This one line uses tabs for indentation, all others spaces. > + if (unlikely (diridx >= ndirs)) > + { > + __libdw_seterrno (DWARF_E_INVALID_DIR_IDX); > + goto invalid_data; > + } > + Dwarf_Word mtime; > + if (unlikely (linep >= lineendp)) > + goto invalid_data; > + get_uleb128 (mtime, linep, lineendp); > + Dwarf_Word filelength; > + if (unlikely (linep >= lineendp)) > + goto invalid_data; > + get_uleb128 (filelength, linep, lineendp); > + > + /* Add new_file to filelist that will be merged with filesp. */ > + struct filelist *new_file = malloc (sizeof (struct > filelist)); > + if (unlikely (new_file == NULL)) > { > - __libdw_seterrno (DWARF_E_INVALID_DIR_IDX); > - goto invalid_data; > + __libdw_seterrno (DWARF_E_NOMEM); > + goto out; > } And this last block uses tabs again. > - Dwarf_Word mtime; > - if (unlikely (linep >= lineendp)) > - goto invalid_data; > - get_uleb128 (mtime, linep, lineendp); > - Dwarf_Word filelength; > - if (unlikely (linep >= lineendp)) > - goto invalid_data; > - get_uleb128 (filelength, linep, lineendp); > - > - struct filelist *new_file = NEW_FILE (); > - if (fname[0] == '/') > - new_file->info.name = fname; > - else > - { > - new_file->info.name = > - libdw_alloc (dbg, char, 1, (dirarray[diridx].len + 1 > - + fnamelen + 1)); > - char *cp = new_file->info.name; > - > - if (dirarray[diridx].dir != NULL) > - /* This value could be NULL in case the > - DW_AT_comp_dir was not present. We > - cannot do much in this case. Just > - keep the file relative. */ > - { > - cp = stpcpy (cp, dirarray[diridx].dir); > - *cp++ = '/'; > - } > - strcpy (cp, fname); > - } > - > - new_file->info.mtime = mtime; > - new_file->info.length = filelength; > - } > - break; > + nfilelist++; > + new_file->next = filelist; > + filelist = new_file; Here tabs again, then the rest spaces. > + if (fname[0] == '/') > + new_file->info.name = fname; > + else > + { > + /* Directory names are stored in a char *[ndirs] located > + after the last Dwarf_Fileinfo_s. */ > + size_t nfiles = (*filesp)->nfiles; > + const char **dirarray > + = (const char **) &((*filesp)->info[nfiles]); > + > + const char *dname = dirarray[diridx]; > + size_t dnamelen = strlen (dname); > + > + new_file->info.name = > + libdw_alloc (dbg, char, 1, (dnamelen + fnamelen + 2)); > + char *cp = new_file->info.name; > + > + if (dname != NULL) > + > + /* This value could be NULL in case the > + DW_AT_comp_dir was not present. We > + cannot do much in this case. Just > + keep the file relative. */ > + > + { > + cp = stpcpy (cp, dname); > + *cp++ = '/'; > + } > + strcpy (cp, fname); > + } > + > + new_file->info.mtime = mtime; > + new_file->info.length = filelength; > + } > + break; > So inconsistent indentation, but the code looks good. > @@ -1027,32 +1183,49 @@ read_srclines (Dwarf *dbg, > } > } > > - /* Put all the files in an array. */ > - Dwarf_Files *files = libdw_alloc (dbg, Dwarf_Files, > - sizeof (Dwarf_Files) > - + nfilelist * sizeof (Dwarf_Fileinfo) > - + (ndirlist + 1) * sizeof (char *), > - 1); > - const char **dirs = (void *) &files->info[nfilelist]; > - > - struct filelist *fileslist = filelist; > - files->nfiles = nfilelist; > - for (size_t n = nfilelist; n > 0; n--) > + /* Merge filesp with the files from DW_LNE_define_file, if any. */ > + if (unlikely (filelist != NULL)) > { > - files->info[n - 1] = fileslist->info; > - fileslist = fileslist->next; > - } > - assert (fileslist == NULL); > + Dwarf_Files *prevfiles = *filesp; > + size_t ndirs = prevfiles->ndirs; > + size_t nprevfiles = prevfiles->nfiles; > + size_t nnewfiles = nprevfiles + nfilelist; > + > + Dwarf_Files *newfiles > + = libdw_alloc (dbg, Dwarf_Files, > + sizeof (Dwarf_Files) > + + nnewfiles * sizeof (Dwarf_Fileinfo) > + + (ndirs + 1) * sizeof (char *), > + 1); > + > + > + /* Copy prevfiles to newfiles. */ > + for (size_t n = 0; n < nprevfiles; n++) > + newfiles->info[n] = prevfiles->info[n]; > + > + /* Add files from DW_LNE_define_file to newfiles. */ > + struct filelist *fileslist = filelist; > + for (size_t n = nfilelist; n > 0; n--) > + { > + newfiles->info[nprevfiles + n - 1] = fileslist->info; > + fileslist = fileslist->next; > + } > > - /* Put all the directory strings in an array. */ > - files->ndirs = ndirlist; > - for (unsigned int i = 0; i < ndirlist; ++i) > - dirs[i] = dirarray[i].dir; > - dirs[ndirlist] = NULL; > + if (fileslist != NULL) > + goto invalid_data; > > - /* Pass the file data structure to the caller. */ > - if (filesp != NULL) > - *filesp = files; > + const char **newdirs = (void *) &newfiles->info[nnewfiles]; > + const char **prevdirs = (void *) &prevfiles->info[nprevfiles]; > + > + /* Copy prevdirs to newdirs. */ > + for (size_t n = 0; n < ndirs; n++) > + newdirs[n] = prevdirs[n]; Again slightly off indentation. But I also don't fully follow the prevdirs/newdirs copying. Why is this? No newdirs are defined here, are there? Maybe I don't understand the data-structure used here. > diff --git a/tests/.gitignore b/tests/.gitignore > index 0289959d..e40ad238 100644 > --- a/tests/.gitignore > +++ b/tests/.gitignore > @@ -74,6 +74,7 @@ > /funcscopes > /get-aranges > /get-files > +/get-files-define_file > /get-lines > /get-pubnames > /get-units-invalid > diff --git a/tests/Makefile.am b/tests/Makefile.am > index 9141074f..7b016dc8 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -35,7 +35,7 @@ endif > check_PROGRAMS = arextract arsymtest newfile saridx scnnames sectiondump \ > showptable update1 update2 update3 update4 test-nlist \ > show-die-info get-files next-files get-lines next-lines \ > - get-pubnames \ > + get-pubnames get-files-define-file \ > get-aranges allfcts line2addr addrscopes funcscopes \ > show-abbrev hash newscn ecp dwflmodtest \ > find-prologues funcretval allregs rdwrmmap \ > @@ -646,7 +646,8 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \ > testfile-dwp-5-cu-index-overflow.dwp.bz2 \ > testfile-dwp-4-cu-index-overflow.bz2 \ > testfile-dwp-4-cu-index-overflow.dwp.bz2 \ > - testfile-dwp-cu-index-overflow.source > + testfile-dwp-cu-index-overflow.source \ > + testfile-define-file.bz2 > > > if USE_VALGRIND > @@ -725,6 +726,7 @@ show_abbrev_LDADD = $(libdw) $(libelf) > get_lines_LDADD = $(libdw) $(libelf) > next_lines_LDADD = $(libdw) $(libelf) > get_files_LDADD = $(libdw) $(libelf) > +get_files_define_file_LDADD = $(libdw) $(libelf) > next_files_LDADD = $(libdw) $(libelf) > get_aranges_LDADD = $(libdw) $(libelf) > allfcts_LDADD = $(libdw) $(libelf) > diff --git a/tests/get-files-define-file.c b/tests/get-files-define-file.c > new file mode 100644 > index 00000000..583f9852 > --- /dev/null > +++ b/tests/get-files-define-file.c > @@ -0,0 +1,162 @@ > +/* Copyright (C) 2002, 2004, 2005, 2007 Red Hat, Inc. > + This file is part of elfutils. > + Written by Ulrich Drepper <drep...@redhat.com>, 2002. > + > + This file is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3 of the License, or > + (at your option) any later version. > + > + elfutils is distributed in the hope that it will be useful, but > + WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with this program. If not, see <http://www.gnu.org/licenses/>. */ > + > +#ifdef HAVE_CONFIG_H > +# include <config.h> > +#endif > + > +#include <fcntl.h> > +#include <libelf.h> > +#include ELFUTILS_HEADER(dw) > +#include <stdio.h> > +#include <unistd.h> > +#include "../libdw/libdwP.h" > + > +static void > +print_dirs_and_files (Dwarf_Files *files, const char *const *dirs, > + size_t nfiles, size_t ndirs) > +{ > + if (dirs[0] == NULL) > + puts (" dirs[0] = (null)"); > + else > + printf (" dirs[0] = \"%s\"\n", dirs[0]); > + for (size_t i = 1; i < ndirs; ++i) > + printf (" dirs[%zu] = \"%s\"\n", i, dirs[i]); > + > + for (size_t i = 0; i < nfiles; ++i) > + printf (" file[%zu] = \"%s\"\n", i, > + dwarf_filesrc (files, i, NULL, NULL)); > +} > + > + > +int > +main (int argc, char *argv[]) > +{ > + int result = 0; > + int cnt = argc - 1; > + > + int fd = open (argv[cnt], O_RDONLY); > + > + Dwarf *dbg = dwarf_begin (fd, DWARF_C_READ); > + if (dbg == NULL) > + { > + printf ("%s not usable\n", argv[cnt]); > + result = 1; > + if (fd != -1) > + close (fd); > + goto out; > + } > + > + Dwarf_Off o = 0; > + Dwarf_Off ncu; > + size_t cuhl; > + > + /* Just inspect the first CU. */ > + if (dwarf_nextcu (dbg, o, &ncu, &cuhl, NULL, NULL, NULL) != 0) > + { > + printf ("%s: cannot get CU\n", argv[cnt]); > + result = 1; > + goto out; > + } > + > + Dwarf_Die die_mem; > + Dwarf_Die *die = dwarf_offdie (dbg, o + cuhl, &die_mem); > + > + if (die == NULL) > + { > + printf ("%s: cannot get CU die\n", argv[cnt]); > + result = 1; > + goto out; > + } > + > + Dwarf_Files *files; > + size_t nfiles; > + > + /* The files from DW_LNE_define_file should not be included > + until dwarf_getsrclines is called. */ > + if (dwarf_getsrcfiles (die, &files, &nfiles) != 0) > + { > + printf ("%s: cannot get files\n", argv[cnt]); > + result = 1; > + goto out; > + } > + > + if (die->cu->lines != NULL) > + { > + printf ("%s: dwarf_getsrcfiles should not get lines\n", argv[cnt]); > + result = 1; > + goto out; > + } > + > + const char *const *dirs; > + size_t ndirs; > + if (dwarf_getsrcdirs (files, &dirs, &ndirs) != 0) > + { > + printf ("%s: cannot get include directories\n", argv[cnt]); > + result = 1; > + goto out; > + } > + > + /* Print file info without files from DW_LNE_define_file. */ > + print_dirs_and_files (files, dirs, nfiles, ndirs); > + > + Dwarf_Lines *lines; > + size_t nlines; > + > + /* Reading the line program should add the new files. */ > + if (dwarf_getsrclines (die, &lines, &nlines) != 0) > + { > + printf ("%s: cannot get lines\n", argv[cnt]); > + result = 1; > + goto out; > + } > + > + Dwarf_Files *updated_files; > + size_t num_updated_files; > + > + /* Get the new files. */ > + if (dwarf_getsrcfiles (die, &updated_files, &num_updated_files) != 0) > + { > + printf ("%s: cannot get files\n", argv[cnt]); > + result = 1; > + goto out; > + } > + > + const char *const *updated_dirs; > + size_t num_updated_dirs; > + > + /* The dirs shouldn't change but verify that getsrcdirs still works. */ > + if (dwarf_getsrcdirs (updated_files, &updated_dirs, &num_updated_dirs) != > 0) > + { > + printf ("%s: cannot get include directories\n", argv[cnt]); > + result = 1; > + goto out; > + } > + > + /* Verify that we didn't invalidate the old file info. */ > + print_dirs_and_files (files, dirs, nfiles, ndirs); > + > + /* Print all files including those from DW_LNE_define_file. */ > + print_dirs_and_files (updated_files, updated_dirs, > + num_updated_files, num_updated_dirs); > + > +out: > + dwarf_end (dbg); > + close (fd); > + > + return result; > +} > > [...] > diff --git a/tests/run-get-files.sh b/tests/run-get-files.sh > index 1306544d..c2bc01bf 100755 > --- a/tests/run-get-files.sh > +++ b/tests/run-get-files.sh > @@ -18,7 +18,7 @@ > > . $srcdir/test-subr.sh > > -testfiles testfile testfile2 > +testfiles testfile testfile2 testfile-define-file > > testrun_compare ${abs_builddir}/get-files testfile testfile2 <<\EOF > cuhl = 11, o = 0, asz = 4, osz = 4, ncu = 191 > @@ -245,4 +245,67 @@ cuhl = 11, o = 0, asz = 8, osz = 4, ncu = 857 > file[3] = "/usr/include/stdc-predef.h" > EOF > > +tempfiles files define-files.out get-files-define-file.out > + > +cat > files <<\EOF > + dirs[0] = "session" > + dirs[1] = "/home/wcohen/minimal_mod" > + dirs[2] = "include/asm" > + dirs[3] = "include/linux" > + dirs[4] = "include/asm-generic" > + file[0] = "???" > + file[1] = "/home/wcohen/minimal_mod/minimal_mod.c" > + file[2] = "include/asm/gcc_intrin.h" > + file[3] = "include/linux/kernel.h" > + file[4] = "include/asm/processor.h" > + file[5] = "include/asm/types.h" > + file[6] = "include/asm/ptrace.h" > + file[7] = "include/linux/sched.h" > + file[8] = "include/asm/thread_info.h" > + file[9] = "include/linux/thread_info.h" > + file[10] = "include/asm/atomic.h" > + file[11] = "include/linux/list.h" > + file[12] = "include/linux/cpumask.h" > + file[13] = "include/linux/rbtree.h" > + file[14] = "include/asm/page.h" > + file[15] = "include/linux/rwsem.h" > + file[16] = "include/asm/rwsem.h" > + file[17] = "include/asm/spinlock.h" > + file[18] = "include/linux/completion.h" > + file[19] = "include/linux/wait.h" > + file[20] = "include/linux/aio.h" > + file[21] = "include/linux/workqueue.h" > + file[22] = "include/linux/timer.h" > + file[23] = "include/linux/types.h" > + file[24] = "include/asm/posix_types.h" > + file[25] = "include/linux/pid.h" > + file[26] = "include/linux/time.h" > + file[27] = "include/linux/capability.h" > + file[28] = "include/linux/signal.h" > + file[29] = "include/linux/resource.h" > + file[30] = "include/linux/sem.h" > + file[31] = "include/asm/fpu.h" > + file[32] = "include/linux/fs_struct.h" > + file[33] = "include/asm/signal.h" > + file[34] = "include/asm/siginfo.h" > + file[35] = "include/asm-generic/siginfo.h" > + file[36] = "include/asm/nodedata.h" > + file[37] = "include/linux/mmzone.h" > + file[38] = "include/linux/jiffies.h" > + file[39] = "include/asm/io.h" > + file[40] = "include/asm/machvec.h" > + file[41] = "include/asm/smp.h" > + file[42] = "include/asm/numa.h" > + file[43] = "include/linux/slab.h" > +EOF > + > +# Files should be printed 3 times, followed by the files from > DW_LNE_define_file > +cat files > define-files.out > +cat files >> define-files.out > +cat files >> define-files.out > +echo ' file[44] = "include/asm/abc.c"' >> define-files.out > +echo ' file[45] = "include/linux/01.c"' >> define-files.out > + > +cat define-files.out | testrun_compare ${abs_builddir}/get-files-define-file > testfile-define-file > + > exit 0 Nice. So testfile-define-file is actually testfile36.debug but with a new line program? How did you edit/insert that one? Cheers, Mark