On Wednesday 08 March 2006 00:26, Benjamin LaHaise wrote:
> Hi Andi,
>
> On x86-64 one inefficiency that shows up on profiles is the handling of
> struct page conversion to/from idx and addresses. This is mostly due to
> the fact that struct page is currently 56 bytes on x86-64, so gcc has to
> emit a slow division or multiplication to convert.
Huh?
unsigned long f1(unsigned long x)
{
return x * 56;
}
unsigned long f2(unsigned long x)
{
return x / 56;
}
gives
f1:
leaq 0(,%rdi,8), %rax
salq $6, %rdi
subq %rax, %rdi
movq %rdi, %rax
ret
and
f2:
.LFB3:
shrq $3, %rdi
movabsq $2635249153387078803, %rdx
movq %rdi, %rax
mulq %rdx
movq %rdx, %rax
ret
(it converts it to x * 1/56 )
AFAIK mul has a latency of < 10 cycles even on P4 so I can't imagine
it's a real problem. Something must be wrong with your measurements.
Or maybe it's something else in the conversion functions that's
the problem. The hash lookup? Still I don't quite believe
it, the hash is relatively small.
That said I know ways to make page_to_pfn()/pfn_to_page() faster
In particular some of the terms in the equation that are always
recomputed could be cached. I used to have a patch for that
some time ago, but it had some problems and I ran out of time
so I dropped it.
> By switching to using
> WANT_PAGE_VIRTUAL in asm/page.h, struct page grows to 64 bytes. Address
> calculation becomes cheaper because it is a memory load from the already
> hot struct page. For netperf, this shows up as a ~150 Mbit/s improvement.
My guess would be that on more macro loads it would be a loss due
to more cache misses.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html