Re: [Lldb-commits] [PATCH] Strip ELF symbol versions from symbol names

jingham Thu, 26 Feb 2015 10:07:48 -0800

> On Feb 26, 2015, at 9:02 AM, Pavel Labath <[email protected]> wrote:
> 
> So here is me thinking out loud about this issue...
> 
> What are the current use cases for the Symbols and SymTabs in lldb?
> 
> - symbolification (aka looking up a symbol by address): In this case we would 
> probably want to output "memcpy@@GLIBC_2.14" because that _is_ the name of 
> the symbol in the object file and it also provides the most information.
> 
> - symbol resolution (aka looking up a symbol by name): in which situations do 
> we need to do this? Currently, I am aware of only one: user provided 
> expressions in the "expr" command. Are there any other use cases?


breakpoint set -n

I don't know how these symbol variants work in ELF, but if there's a chance 
that code could call more than one of these variants, depending on how the 
library was built or whatever, then "break set -n" had better resolve to all 
the relevant symbols.  If only one will ever get called by some magic, then 
it's fine to just pick that one.

disassemble -n

>  - the ELF versioning spec says that when we do not have any additional 
> information, we should pick the default (latest) version. This is the one 
> with @@ in it's name. When user types "expr memcpy(a,b,c)", we do not have 
> any information, so the string "memcpy" should resolve to the same address as 
> "memcpy@@GLIBC_2.14". We could try to be clever and figure out what version 
> is used in the rest of the code, but that may prove to be quite difficult. 
> Furthermore, we almost definitely want the expression `char foo[]="bar"; 
> do_something_with(foo)` (which compiles to something involving memcpy), to 
> use the default symbol version, since the user is probably not even aware 
> that there is a call to memcpy involved (I certainly wasn't).

The latter will fall out from whatever lookup mechanism you come up with, since 
internally lldb's expression parser tells the JIT what symbol to use.

>  - we would like to keep the non-default symbol versions (e.g. 
> "memcpy@GLIBC_2.2.5"), so that we can do symbolification, but we don't want 
> "memcpy" to resolve to these symbols unless the user explicitly specifies 
> "memcpy@GLIBC_2.2.5" (which right now he can't as the expr command will bark 
> out a syntax error. It might be possible to call the function by embedding 
> the right asm commands in the expr expression, but I do not care about this 
> right now.
> 
> So how do we achieve this? For C symbols we can store the full symbol name in 
> the mangled field and the bare name in the demangled one. However, this does 
> not work for C++ symbols, as they already use both fields. Furthermore, 
> currently the demangling of versioned c++ symbols fails completely as the 
> demangler does not understand the version specifications. For the 
> "symbolification" use case it would be best to have 
> "_ZSt10adopt_lock@@GLIBCXX_3.4.11" as the mangled name and 
> "std::adopt_lock@@GLIBCXX_3.4.11" as the demangled. However, for symbol 
> resolution, we want both "_ZSt10adopt_lock" and 
> "_ZSt10adopt_lock@@GLIBCXX_3.4.11" to resolve correctly. I can think of three 
> ways to achieve this:
> 
> - teach Symbol class to do intelligent string matching, so that it can 
> resolve both versioned and unversioned names. Not optimal since it would 
> complicate the general Symbol class due to a ELF peculiarity.
> - insert two Symbol instances into the Symtab. Symbol resolution would be 
> easy, but if we want to guarantee that we always return the versioned symbol 
> during symbolification, we would need to do something clever there, which is 
> again not nice.
> - allow symbols to have multiple names - again not optimal since it 
> complicates the Symbol class, but at least the version handling could be 
> contained in the ELF specific code - the Symbol wouldn't know about the 
> versions, it would only know it has these 2 (or whatever) names.
> 
> As you can see, I am not exactly thrilled by any of these options. What do 
> you think about it?

How does the linker actually fix up the libraries using these symbols to point 
to the right one?  In Mac OS X the equivalent task is achieved by having a 
symbol of type "resolver" that is actually called memcpy, and when the linker 
needs to call that function, it knows that it should call the resolver 
function, and that will return the address of the correct implementation.  So 
in lldb, we don't have to try to guess what the linker is going to do, we can 
just call this function to get the target symbol.  This can't change over the 
running of the program so we cache the target address of the resolver symbol.

In your case, maybe a better model is an indirect symbol - i.e. saying symbol A 
is an alias for symbol B.  We already have those to support MachO indirect 
symbols, so you could just use that type, though you might have to invent one.  
That is your option 2.  Since it maps to an extant linker trick that seems to 
me the best way to do it.

Jim



> 
> 
> http://reviews.llvm.org/D7884
> 
> EMAIL PREFERENCES
>  http://reviews.llvm.org/settings/panel/emailpreferences/
> 
> 
> 
> _______________________________________________
> lldb-commits mailing list
> [email protected]
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits


_______________________________________________
lldb-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits

Re: [Lldb-commits] [PATCH] Strip ELF symbol versions from symbol names

Reply via email to