Re: [Rpm-maint] [PATCH] debugedit: Support String/Line table rewriting for larger/smaller paths.

2017-03-06 Thread Panu Matilainen

On 03/06/2017 01:15 PM, Mark Wielaard wrote:

On Sat, 2017-03-04 at 11:28 +0200, Panu Matilainen wrote:

I dont feel qualified to really review this, and perhaps others are
feeling the same way since it's been out there for a week now with no
comments at all.

Because this seems quite awesome (even if also a bit scary), to avoid
stalling forever with no-one to review it: if there are no objections
raised by Monday I'm going to just apply it (and the couple of other
recent debuginfo patches). So anybody having doubts, speak up now.


Thanks. Obviously I think the patch is awesome and should be integrated
sooner rather than later :) But being slightly worried this means "you
touched it last, you own it now"

[...]

That's the price of course :)

Applied (all four patches), thanks for all the work and comprehensive 
explanation and self-review!


- Panu -

___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [PATCH] debugedit: Support String/Line table rewriting for larger/smaller paths.

2017-03-06 Thread Mark Wielaard
On Sat, 2017-03-04 at 11:28 +0200, Panu Matilainen wrote:
> I dont feel qualified to really review this, and perhaps others are 
> feeling the same way since it's been out there for a week now with no 
> comments at all.
> 
> Because this seems quite awesome (even if also a bit scary), to avoid 
> stalling forever with no-one to review it: if there are no objections 
> raised by Monday I'm going to just apply it (and the couple of other 
> recent debuginfo patches). So anybody having doubts, speak up now.

Thanks. Obviously I think the patch is awesome and should be integrated
sooner rather than later :) But being slightly worried this means "you
touched it last, you own it now", let me do at least my own review.

Questions I would ask about this patch if someone else would have
submitted it:

- Why depend on elfutils libebl/libdw for the .debug_str string table
  reconstruction? Isn't a string table just a simple concatenation of
  strings?

It is indeed a simple concatenation of strings, but to pack them
efficiently substrings can be reused/shared. e.g. if the table contains
the string "easy_and_slow" at index 16, then it also contains the string
"slow" at index 25. The elfutils strtab code already handles this
efficiently, so I wanted to depend on that instead of open coding it.

- Why introduce those string storage allocation blocks? Can't you just
  malloc whenever you need a new string?

To make sure we don't leak too much memory we need to track our mallocs
already. strings come either from the existing .debug_str table or from
newly created (replacement) strings. I felt it was a good idea to pack
those strings as closely together because they might be compared often.
So keep them close together helps data cache locality. But this was
probably premature optimisation since we end up at the strings
indirectly through the original indexes and we compare those first (if
the original indexes are equal then the (replacement) strings would be
the same too (and so we end up not having to compare that many strings
anyway). But we needed a structure to keep track of our allocated
strings anyway. So I kept it as is.

- To compare/find existing string table indexes the code uses a binary
  search tree (tsearch), but for line table indexes the code uses a
  simple (sorted) array (bsearch). Why different mechanisms for the same
  kind of lookup?

There are much, much more string table indexes than line table offsets
(e.g. libstdc++.so has 125 line tables, and 10331 full strings [ignoring
shared "tails"]). So for line tables we don't mind if we sometimes have
to do a quick linear search (in phase 0). But for the string indexes it
is important to always be able to search/insert them efficiently. So for
the string table indexes we always use the binary search tree balanced.
For line tables we simply sort the array when we collected all offsets.
The binary search tree does have a bit more memory overhead, but it was
worth it for the fast lookups [*].

- Does the switch from mmap and rawdata to simple read/write and
  elf_data cause much more memory usage?

Yes and no. When mmapping the file the memory is still allocated, but
might not really be used. We do in most cases end up using most of it
because for recalculating the build-id we need to mmap/read in all
section data. The reading in of the ELF data through elf_getdata is
slightly less efficient that elf_rawdata because it might have to do
endian conversion if the ELF file is not native. But in practice it
always is (we just build it). We can probably do better by being smarter
about when/how we update the build-ids. I didn't try to do that yet.

- The relocation rewriting seems scary.

Yes it is :{ Luckily this is only needed for object files (ET_REL) and
normally they aren't packaged. The big exceptions are kernel modules
(why those are ET_REL files and not simply ET_DYN is beyond me) and
static ar archives. If we would have to handle all possible
(architecture specific) relocations this would be really impossible.
Luckily for the indexes into .debug sections only simple relocation
types are used (basically it provides an extra offset into the section).
The line table relocations pointing into code sections are also simple
because we don't care about the actual relocation type, just where in
the section they are. So we can simply move the relocation offset.
We don't handle static ar archives at all. Which is something we might
want to look at at some later point.

- The code ends up touching allocated sections! Isn't that a no-no?

Yes. Anything allocated really shouldn't be touched. Only the
non-allocated (debug) sections should be touched/written out. Sadly a
bug in elfutils means we do end up having to mark the allocated sections
dirty to make sure the whole ELF file ends up correct on disk. This is
really unfortunate and might make things a bit slower. But no
(allocated) data is really changed at all. And once the bug is fixed (in
elfutils 0.169) this 

Re: [Rpm-maint] [PATCH] debugedit: Support String/Line table rewriting for larger/smaller paths.

2017-03-06 Thread Thierry Vignaud
On 4 March 2017 at 10:28, Panu Matilainen  wrote:
>> From: Mark Wielaard 
>>
>> debugedit --base to --dest rewriting of debug source file paths only
>> supported dest paths that were smaller or equal than the base path
>> (and the size should differ more than 1 character for correct debug
>> lines).
>> All paths were changed "in place". Which could in theory mess up debug str
>> sharing.
>>
>> This rewrite supports base and dest strings of any size (some limitations,
>> see below). This is done by reconstructing the debug_str and debug_line
>> tables and updating the references in the debug_info attributes pointing
>> to these tables. Plus, if necessary (only for ET_REL kernel modules),
>> updating any relocations for the debug_info and debug_line sections.
>>
>> This has the nice benefit of merging any duplicate strings in the
>> debug_str table which might resulting on slightly smaller files.
>> kernel modules are ET_REL files that often contain a lot of duplicate
>> strings.
>>
>> The rewrite uses elfutils (either libebl or libdw) to reconstruct the
>> debug_str table. Since we are changing some section sizes now we cannot
>> just use mmap and rawdata to poke the values, but need to read in and
>> write out the changed sections. This does take a bit more memory because
>> we now also need to keep track of all string/line references.
>
>
> I dont feel qualified to really review this, and perhaps others are feeling
> the same way since it's been out there for a week now with no comments at
> all.
>
> Because this seems quite awesome (even if also a bit scary), to avoid
> stalling forever with no-one to review it: if there are no objections raised
> by Monday I'm going to just apply it (and the couple of other recent
> debuginfo patches). So anybody having doubts, speak up now.

If it does fix the infamous "Dest dir longer than base dir is not
supported", the merrier.
Though that didn't impact that much users in practice (but we did get
a report in mga a long time ago from one guy who set _topdir to
/RPM...).
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [PATCH] debugedit: Support String/Line table rewriting for larger/smaller paths.

2017-03-04 Thread Panu Matilainen

On 02/27/2017 05:28 PM, Mark Wielaard wrote:

From: Mark Wielaard 

debugedit --base to --dest rewriting of debug source file paths only
supported dest paths that were smaller or equal than the base path
(and the size should differ more than 1 character for correct debug lines).
All paths were changed "in place". Which could in theory mess up debug str
sharing.

This rewrite supports base and dest strings of any size (some limitations,
see below). This is done by reconstructing the debug_str and debug_line
tables and updating the references in the debug_info attributes pointing
to these tables. Plus, if necessary (only for ET_REL kernel modules),
updating any relocations for the debug_info and debug_line sections.

This has the nice benefit of merging any duplicate strings in the
debug_str table which might resulting on slightly smaller files.
kernel modules are ET_REL files that often contain a lot of duplicate
strings.

The rewrite uses elfutils (either libebl or libdw) to reconstruct the
debug_str table. Since we are changing some section sizes now we cannot
just use mmap and rawdata to poke the values, but need to read in and
write out the changed sections. This does take a bit more memory because
we now also need to keep track of all string/line references.


I dont feel qualified to really review this, and perhaps others are 
feeling the same way since it's been out there for a week now with no 
comments at all.


Because this seems quite awesome (even if also a bit scary), to avoid 
stalling forever with no-one to review it: if there are no objections 
raised by Monday I'm going to just apply it (and the couple of other 
recent debuginfo patches). So anybody having doubts, speak up now.


- Panu -
___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


[Rpm-maint] [PATCH] debugedit: Support String/Line table rewriting for larger/smaller paths.

2017-02-27 Thread Mark Wielaard
From: Mark Wielaard 

debugedit --base to --dest rewriting of debug source file paths only
supported dest paths that were smaller or equal than the base path
(and the size should differ more than 1 character for correct debug lines).
All paths were changed "in place". Which could in theory mess up debug str
sharing.

This rewrite supports base and dest strings of any size (some limitations,
see below). This is done by reconstructing the debug_str and debug_line
tables and updating the references in the debug_info attributes pointing
to these tables. Plus, if necessary (only for ET_REL kernel modules),
updating any relocations for the debug_info and debug_line sections.

This has the nice benefit of merging any duplicate strings in the
debug_str table which might resulting on slightly smaller files.
kernel modules are ET_REL files that often contain a lot of duplicate
strings.

The rewrite uses elfutils (either libebl or libdw) to reconstruct the
debug_str table. Since we are changing some section sizes now we cannot
just use mmap and rawdata to poke the values, but need to read in and
write out the changed sections. This does take a bit more memory because
we now also need to keep track of all string/line references.

There are still some limitations (already in the original debugedit)
not fixed by this rewrite:
- DW_AT_comp_dir in .debug_info using DW_FORM_string can not be made
  larger. We only warn about that now instead of failing. The only
  producer of DW_FORM_string comp_dirs is binutils gas. It seems simpler
  to fix gas than to try to support resizing the debug_info section.
- A DW_AT_name on a DW_TAG_compile_unit is only rewritten for DW_FORM_strp
  not for DW_FORM_string. Probably no problem in practice since this
  wasn't supported originally either.
- The debug_line program isn't scanned for DW_LNE_define_file which
  could in theory define an absolute path that might need rewriting.
  Again probably not a problem because this wasn't supported before
  and there are no know producers for this construct.

To support the upcoming DWARFv5 in gcc 7 (not on by default), we will
need to add support for the new debug_line format and scan the new
debug_macro section that can have references to the debug_str table.

Signed-off-by: Mark Wielaard 
---
 Makefile.am   |8 +-
 configure.ac  |6 +
 tools/debugedit.c | 1569 -
 3 files changed, 1330 insertions(+), 253 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index b99da12..aeb2c33 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -154,13 +154,18 @@ rpm2archive_LDADD +=  @WITH_BEECRYPT_LIB@ 
@WITH_NSS_LIB@ @WITH_POPT_LIB@ @WITH_ZL
 
 if LIBELF
 if LIBDWARF
+if LIBDW
 rpmconfig_SCRIPTS += scripts/find-debuginfo.sh
 
 rpmlibexec_PROGRAMS += debugedit
 debugedit_SOURCES =tools/debugedit.c tools/hashtab.c tools/hashtab.h
 debugedit_LDADD =  rpmio/librpmio.la
 debugedit_LDADD += @WITH_LIBELF_LIB@ @WITH_POPT_LIB@
-
+if HAVE_LIBDW_STRTAB
+debugedit_LDADD += @WITH_LIBDW_LIB@
+else
+debugedit_LDADD += @WITH_LIBDW_LIB@ -lebl
+endif
 rpmlibexec_PROGRAMS += elfdeps
 elfdeps_SOURCES =  tools/elfdeps.c
 elfdeps_LDADD =rpmio/librpmio.la
@@ -171,6 +176,7 @@ sepdebugcrcfix_SOURCES = tools/sepdebugcrcfix.c
 sepdebugcrcfix_LDADD = @WITH_LIBELF_LIB@
 endif
 endif
+endif
 
 rpmlibexec_PROGRAMS += rpmdeps
 rpmdeps_SOURCES =  tools/rpmdeps.c
diff --git a/configure.ac b/configure.ac
index 4f3be87..5c24ac6 100644
--- a/configure.ac
+++ b/configure.ac
@@ -371,18 +371,24 @@ AM_CONDITIONAL(WITH_ARCHIVE,[test "$with_archive" = yes])
 #=
 # Check for elfutils libdw library with dwelf_elf_gnu_build_id.
 WITH_LIBDW_LIB=
+HAVE_LIBDW_STRTAB=
 AS_IF([test "$WITH_LIBELF" = yes],[
   AC_CHECK_HEADERS([elfutils/libdwelf.h],[
+# dwelf_elf_gnu_build_id was introduced in elfutils 0.159
 AC_CHECK_LIB(dw, dwelf_elf_gnu_build_id, [
   AC_DEFINE(HAVE_LIBDW, 1,
 [Define to 1 if you have elfutils libdw library])
   WITH_LIBDW_LIB="-ldw"
   WITH_LIBDW=yes
+  # If possible we also want the strtab functions from elfutils 0.167.
+  # But we can fall back on the (unsupported) ebl alternatives if not.
+  AC_CHECK_LIB(dw, dwelf_strtab_init, [HAVE_LIBDW_STRTAB=yes])
 ])
   ])
 ])
 AC_SUBST(WITH_LIBDW_LIB)
 AM_CONDITIONAL(LIBDW,[test "$WITH_LIBDW" = yes])
+AM_CONDITIONAL(HAVE_LIBDW_STRTAB,[test "$HAVE_LIBDW_STRTAB" = yes])
 
 #=
 # Process --with/without-external-db
diff --git a/tools/debugedit.c b/tools/debugedit.c
index c0147f0..4798c63 100644
--- a/tools/debugedit.c
+++ b/tools/debugedit.c
@@ -1,6 +1,7 @@
-/* Copyright (C) 2001-2003, 2005, 2007, 2009-2011, 2016 Red Hat, Inc.
+/* Copyright (C) 2001-2003, 2005, 2007, 2009-2011, 2016, 2017 Red Hat, Inc.
Written by Alexander Larsson , 2002
Based on code by Jakub Jelinek , 2001.
+