On 14/06/2013 06:13, Marvin Humphrey wrote:
Hmm...  And what we would really need to do may be even harder: we need to
assign aliases *at runtime*, when the DSO loads.

Maybe it's impossible.

Yes, I think so. But if we have one thunk per method, I think we could avoid assigning the alias at runtime.

Here's what gdb prints out for the assembler.  I don't understand the `push`
and `pop` instructions.

Dump of assembler code for function cfish_thunk112:
0x0000000100075120 <cfish_thunk112+0>: push   %rbp
0x0000000100075121 <cfish_thunk112+1>: mov    %rsp,%rbp
0x0000000100075124 <cfish_thunk112+4>: mov    0x8(%rdi),%rax
0x0000000100075128 <cfish_thunk112+8>: pop    %rbp
0x0000000100075129 <cfish_thunk112+9>: jmpq   *0x70(%rax)
0x000000010007512c <cfish_thunk112+12>: nopl   0x0(%rax)
End of assembler dump.

The push and pop instructions are for setting up the frame pointer in %rbp. They'll disappear if you compile with -fomit-frame-pointer.

A single thunk per offset might be bad for branch prediction. But this could
be worked around by providing separate thunks for each method.

I think we could evaluate that using cachegrind on c/t/test_lucy.

Yes, cachegrind can give you the absolute number of branch mispredictions or cache misses. It's a great tool to understand why certain changes make code faster or slower. But we'd need real-world benchmarks to tell whether an optimization like this has a positive effect at all.

Nick

Reply via email to