:I assumed too much in asking the question; I was specifically
:interested in indirect function calls, since this has a direct impact
:on method-style implementations.
Branch prediction caches are typically PC-sensitive. An indirect method
call will never be as fast as a direct call, but if the indirect address
is the same the branch prediction cache will work.
If the indirect address changes at the PC where the call is being made,
the branch cache may create a penalty.
Try this core in one of the cases to that test program, and add two nop
subroutines void nop1(void) { } and void nop2(void) { }.
Compile this code without any optimizations! *no* optimizations or
the test will not demonstrate the problem :-)
In this case the branch prediction succeeds because the indirect
address does not change at the PC where func() is called. I get 34 ns
per loop.
{
void (*func)(void) = nop1;
for (i = 0; i < LOOPS; ++i) {
func();
if (i & 1)
func = nop1;
else
func = nop1;
}
}
In this case the branch prediction fails because the indirect address
is different at the PC each time func() is called. I get 61ns.
{
void (*func)(void) = nop1;
for (i = 0; i < LOOPS; ++i) {
func();
if (i & 1)
func = nop1;
else
func = nop2;
}
}
In this case we simulate a mix. (i & 1) -> (i & 7). I get 47 ns.
{
void (*func)(void) = nop1;
for (i = 0; i < LOOPS; ++i) {
func();
if (i & 7)
func = nop1;
else
func = nop2;
}
}
Ok, so what does this mean for method calls? If the method call is
INLINED, then the branch prediction cache will tend to work because the
method call will tend to call the same address at any given PC. If
the method call is doubly-indirect, where a routine is called which
calculates the method address and then calls it, the branch prediction
cache will tend to fail because a different address will tend to be
called at the PC of the call.
-Matt
Matthew Dillon
<[EMAIL PROTECTED]>
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message