Re: [lucy-dev] Member vars fragile ABI fix

Marvin Humphrey Mon, 15 Jul 2013 18:48:38 -0700

On Mon, Jul 15, 2013 at 4:50 AM, Nick Wellnhofer <[email protected]> wrote:
> Micro-benchmarks are always a bit dangerous, but it seems that on modern
> CPUs our current implementation of method dispatch is really fast and
> probably hard to beat. I think a loop with a single method call took about
> 6-7 cycles per iteration (Ivy Bridge, x64). This is surprising since
> computing the method address alone requires three memory loads with a
> latency of 4-5 cycles each (the third load depending on the first two). But
> because of branch prediction, the CPU can take a speculative branch
> immediately without having to wait for the memory loads. The branch target
> is validated later, so the loads can be pipelined.


Here's an Intel Xeon E5430 from a few years ago running 32-bit CentOS 5.0:

    http://en.wikipedia.org/wiki/Xeon#5400-series_.22Harpertown.22

    $ make -f Makefile.linux
    LD_LIBRARY_PATH=. ./exe
    cycles/call with method ptr loop: 11.226522
    cycles/call with wrapper loop: 14.781987
    cycles/call with fixed offset wrapper loop: 10.538704
    cycles/call with wrapper: 19.622476
    cycles/call with simulated inline: 7.702859

Here's a more recent Intel Xeon E5620 running 64-bit CentOS 5.5:

    
http://en.wikipedia.org/wiki/Xeon#3600.2F5600-series_.22Gulftown.22_.26_.22Westmere-EP.22

    $ make -f Makefile.linux
    LD_LIBRARY_PATH=. ./exe
    cycles/call with method ptr loop: 7.014678
    cycles/call with wrapper loop: 7.016887
    cycles/call with fixed offset wrapper loop: 7.014423
    cycles/call with wrapper: 10.520168
    cycles/call with simulated inline: 2.339327

What's interesting about those results is that on the modern CPU the
micro-benchmark yields essentially identical results for C++ style fixed
offset vtable dispatch, Clownfish-style variable offset vtable dispatch, or a
saved raw function pointer, while on the older CPU the Clownfish-style
variable offset dispatch performs slightly worse.

For what it's worth, the Clownfish "inside-out vtable" design is similar to
the techniques described by Dachuan Yu et al in 2002 -- benchmarks here:

  https://www.usenix.org/legacy/events/javavm02/yu/yu_html/node29.html

> My guess is that most of the time this happens, there will be another
> non-compatible API change anyway.

I would argue that there is still a large benefit in user interface simplicity
by making it impossible to break the ABI without also breaking the API.
Eliminating this last quirk is a big deal because it substantially reduces the
knowledge and mental effort required to write ABI-compatible code.

> OK, but this will mean to add MethodSpec structs for every method of a
> class. I think it's best to use separate structs for novel, overridden, and
> inherited methods then.

I realize that may be a lot of code but until load-time latency becomes a
problem, I think it's an acceptable implementation strategy.

Would you like to work on this, or would you like me to take it on?

Marvin Humphrey

Re: [lucy-dev] Member vars fragile ABI fix

Reply via email to