On Tue, Mar 07, 2006 at 05:27:37PM +0100, Andi Kleen wrote:
> On Wednesday 08 March 2006 00:26, Benjamin LaHaise wrote:
> > Hi Andi,
> >
> > On x86-64 one inefficiency that shows up on profiles is the handling of
> > struct page conversion to/from idx and addresses. This is mostly due to
> > the fact that struct page is currently 56 bytes on x86-64, so gcc has to
> > emit a slow division or multiplication to convert.
>
> Huh?
You used an unsigned long, but ptrdiff_t is signed. gcc cannot use any
shifting tricks because they round incorrectly in the signed case.
> AFAIK mul has a latency of < 10 cycles even on P4 so I can't imagine
> it's a real problem. Something must be wrong with your measurements.
mul isn't particularly interesting in the profiles, it's the idiv.
> My guess would be that on more macro loads it would be a loss due
> to more cache misses.
But you get less false sharing of struct page on SMP as well. With a 56 byte
page a single struct page can overlap two cachelines, and on this workload
the page definately gets transferred from one CPU to the other.
-ben
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html