https://sourceware.org/bugzilla/show_bug.cgi?id=33684

--- Comment #1 from Ali Bahrami <ali_swbugzilla at emvision dot com> ---
In the previous note, the example of using the Solaris link-editor
with gobjcopy showed that x86 gobjcopy defaults to stripping section
symbols, causing the index of symbols to change, and breaking the
associated symbol sort sections. Here, I'll dig deeper into sort
sections, and show how that impact manifests.

The problem is not with .symtab per se, but with sections associated via
sh_link, that reference it. Here's what has happened:

    - The .symtab symbol table is referenced, via .sh_link, by
      symbol sort sections .SUNW_symtabsort and .SUNW_symtabnsort:

          % elfdump -c foo.so
...
          Section Header[18]:  sh_name: .symtab
              sh_addr:      0                   sh_flags:   0
              sh_size:      0x378               sh_type:    [ SHT_SYMTAB ]
              sh_offset:    0x8f8               sh_entsize: 0x18 (37 entries)
              sh_link:      19                  sh_info:    28
              sh_addralign: 0x8

          Section Header[19]:  sh_name: .strtab
              sh_addr:      0                   sh_flags:   [ SHF_STRINGS ]
              sh_size:      0x7d                sh_type:    [ SHT_STRTAB ]
              sh_offset:    0xc70               sh_entsize: 0
              sh_link:      0                   sh_info:    0
              sh_addralign: 0x1

          Section Header[20]:  sh_name: .SUNW_symtabsort
              sh_addr:      0                   sh_flags:   0
              sh_size:      0x18                sh_type:    [ SHT_SUNW_symsort
]
              sh_offset:    0xcf0               sh_entsize: 0x4 (6 entries)
              sh_link:      18                  sh_info:    0
              sh_addralign: 0x8

          Section Header[21]:  sh_name: .SUNW_symtabnsort
              sh_addr:      0                   sh_flags:   0
              sh_size:      0x28                sh_type:  [ SHT_SUNW_symnsort ]
              sh_offset:    0xd08               sh_entsize: 0x4 (10 entries)
              sh_link:      18                  sh_info:    0
              sh_addralign: 0x8
...

    - Symbol sort sections, which are a Solaris ELFOSABI feature,
      contain indexes into the .symtab. These sort the symbols by
      name, or by address, which is very helpful to observability
      tools wanting to use a binary search to rapidly map a name/address
      to an enclosing symbol

    - objcopy rewrites .symtab, which makes it shorter, and which
      changes the symbol indexes of the kept symbols. Both of these
      changes invalidate the data in the sort sections, but that data
      is left unchanged, to cause trouble later.

While these sort sections are a Solaris ABI feature, and not something
that objcopy should necessarily be expected to know about, the use of
sh_link to connect them is a generic ELF feature, and something that
objcopy should have been aware of. Note that the bug mentioned previously
(19938) that was reopened last September also involved properly managing
sh_link and sh_info. This old blog that I wrote back at that time might
prove useful now as well:

    https://www.linker-aliens.org/blogs/ali/entry/how_to_strip_an_elf/

When sections objcopy does not understand reference a section it
wants to modify via sh_link or sh_info, that should be a red flag not
to proceed with that modification. It may be OK that objcopy rewrites
the symtab if nothing else references it (though I think dropping these
section symbols is unnecessary and possibly bad for debuggers), but that
assumption doesn't hold when other sections reference it, and it should
be consistent across platforms.

You can see the effect by using elfdump to dump the .symtab section
header from the before/after objects:

    % elfdump -cN.symtab -Ffileprefix foo.so foo_copy.so | grep sh_entsize
    foo.so:      sh_offset:    0x8f8      sh_entsize: 0x18 (37 entries)
    foo_copy.so: sh_offset:    0x950      sh_entsize: 0x18 (14 entries)

23 symbols were removed. By comparing elfdump output for the two
objects, I was able to determine that the dropped symbols are all
section symbols, which look like:

    [2]  0x80500f4     0  SECT LOCL  D    0 .interp

As a quick/dirty hack, I tried changing the objcopy command to:

    % gobjcopy --keep-section-symbols foo.so foo_copy.so

It's an improvement, but we still end up missing 3 symbols:

    % elfdump -cN.symtab -Ffileprefix foo.so foo_copy.so | grep sh_entsize
    foo.so:      sh_offset:    0x8f8      sh_entsize: 0x18 (37 entries)
    foo_copy.so: sh_offset:    0x950      sh_entsize: 0x18 (34 entries)

It turns out that the 3 missing symbols are still section symbols,
the ones for .shstrtab, .symtab, and .strtab. For whatever reason,
--keep-section-symbols didn't retain them, but explicitly requiring
.shstrtab (and not the other 2) to be kept does the trick:

    % gobjcopy --keep-section-symbols \                                         
          --keep-symbol=.shstrtab foo.so foo_copy.so
    % elfdump -cN.symtab -Ffileprefix foo.so foo_copy.so | grep sh_entsize
    foo.so:      sh_offset:    0x8f8      sh_entsize: 0x18 (37 entries)
    foo_copy.so: sh_offset:    0x950      sh_entsize: 0x18 (37 entries)

The right number is important, but so is order. The possibility that
objcopy might reorder the symbols, and in so doing, break referencing
sections, remains. I can verify with elfdump and diff that it doesn't
happen in this case:

    % elfdump -sN.symtab foo.so > a
    % elfdump -sN.symtab foo_copy.so > b
    % diff a b

That's nice, but not necessarily something one can count on, and
not something that a user should have to do in order to prevent
unwanted changes to the symbol table by default. Mainly, I think it
shows that it should be possible to fix objcopy.

Another hack to duck the whole issue is to prevent the referencing
symbol sort sections from being present. Then, objcopy can get away
with rewriting the symbol table:

    % cc hello.c -o hello -zstrip-class=sort_sym
    % gobjcopy hello hello.copy
    % elfdump hello.copy > /dev/null

It works, but it shouldn't be necessary, and I know from experience
that lots of FOSS, even when built on Solaris, uses objcopy, and it
is not realistic to expect special extra options to ld, or to objcopy,
to be used. And anyway, these sections are important --- we want them
to be present by default.

As noted above, symbol sort sections are a feature of the Solaris
ELFOSABI, and not a generic ELF feature. They were introduced in 2019,
and I'll take a guess that this objcopy issue has also existed for a
long time. It's interesting that it wasn't noticed earlier, but I think
that is because because the damage done is silent, and because no one
has happened to run an observability tool (mdb debugger, DTrace, etc)
that would have tried to use the sort sections.

Some general observations about ELF:

    - In general, a symbol table shouldn't be rewritten like this
      unless that was somehow requested. In particular, I don't see
      a justification for removing those section symbols, whatever
      you might think of them. Debuggers can rely on those details.

    - Any time a section is referenced by another section, via sh_link
      or sh_info, one cannot change it without also changing, or
      removing, the depending sections. gobjcopy should check for that.

So how can we fix this? I am freshly aware that rewriting symbol
tables is a core objcopy feature, so it's not a matter of never
doing that. However, I think it can be made safer:

    1) The symtab should not be rewritten unless an explicit option
       requests it. Otherwise, all the symbols, and their order,
       should remain intact.

    2) When an explicit option that would trigger a symtab rewrite
       is given, and other sections reference that symtab via
       sh_link/sh_info, then objcopy has a choice:

          a) Rewrite the referencing sections too, if it understands them.
      or
          b) Throw a fatal error

Could objcopy also rewrite these sort sections? Possibly. These sections
are arrays of 4-byte symtab indexes. It would probably be straightforward
to use a lookup array mapping old indexes to new in order to repair these,
dropping the entries for the symbols that have been eliminated, and
renumbering the others.

   
https://docs.oracle.com/en/operating-systems/solaris/oracle-solaris/11.4/lin\
kers-libraries/symbol-sort-sections.html

It would be great if objcopy had that ability, but it's not needed in
order to address the main issues being reported here, which are that the
x86 gld does not produce section symbols, and that x86 objcopy preemptively
removes them. Fixing this default behavior to handle section symbols
correctly, as seems to already be done on other platforms (i.e. sparc)
would likely be enough.

- Ali

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Reply via email to