it looks like you are comparing these two functions
void
loopxinc(void)
{
uint i, x;
for(i = 0; i < N; i++){
_xinc(&x);
_xdec(&x);
}
}
void
looplock(void)
{
uint i;
static Lock l;
for(i = 0; i < N; i++){
lock(&l);
unlock(&l);
}
}
but the former does two operations and the latter
only one. your claim was that _xinc is slower
than incref (== lock(), x++, unlock()). but you are
timing xinc+xdec against incref.
assuming xinc and xdec are approximately the same
cost (so i can just halve the numbers for loopxinc),
that would make the fair comparison produce:
intel core i7 2.4ghz
loop 0 nsec/call
loopxinc 10 nsec/call // was 20
looplock 11 nsec/call
intel 5000 1.6ghz
loop 0 nsec/call
loopxinc 22 nsec/call // was 44
looplock 25 nsec/call
intel atom 330 1.6ghz (exception!)
loop 2 nsec/call
loopxinc 7 nsec/call // was 14
looplock 22 nsec/call
amd k10 2.0ghz
loop 2 nsec/call
loopxinc 15 nsec/call // was 30
looplock 20 nsec/call
intel p4 xeon 3.0ghz
loop 1 nsec/call
loopxinc 38 nsec/call // was 76
looplock 42 nsec/call
which looks like a much different story.
russ