On 14/06/2013 06:13, Marvin Humphrey wrote:
Hmm... And what we would really need to do may be even harder: we need to
assign aliases *at runtime*, when the DSO loads.
Maybe it's impossible.
Yes, I think so. But if we have one thunk per method, I think we could
avoid assigning the alias at runtime.
Here's what gdb prints out for the assembler. I don't understand the `push`
and `pop` instructions.
Dump of assembler code for function cfish_thunk112:
0x0000000100075120 <cfish_thunk112+0>: push %rbp
0x0000000100075121 <cfish_thunk112+1>: mov %rsp,%rbp
0x0000000100075124 <cfish_thunk112+4>: mov 0x8(%rdi),%rax
0x0000000100075128 <cfish_thunk112+8>: pop %rbp
0x0000000100075129 <cfish_thunk112+9>: jmpq *0x70(%rax)
0x000000010007512c <cfish_thunk112+12>: nopl 0x0(%rax)
End of assembler dump.
The push and pop instructions are for setting up the frame pointer in
%rbp. They'll disappear if you compile with -fomit-frame-pointer.
A single thunk per offset might be bad for branch prediction. But this could
be worked around by providing separate thunks for each method.
I think we could evaluate that using cachegrind on c/t/test_lucy.
Yes, cachegrind can give you the absolute number of branch
mispredictions or cache misses. It's a great tool to understand why
certain changes make code faster or slower. But we'd need real-world
benchmarks to tell whether an optimization like this has a positive
effect at all.
Nick