Re: [9fans] integer width on AMD64 (was: Re: AMD64 system)

erik quanstrom Mon, 07 May 2012 05:34:57 -0700

> both handling pixel graphics and transferring to graphic card are special 
> cases.
> speedup may be due to better prefetch during sequential memory access, but 
> larger data size should not help much here.
> more data causes FSB and PCIe contention, and cache trashing. oops?


pci "memory" is not prefetched.  if you're stuffing bytes in you can use
the write-combining memory type to get pretty good performance for
writes (there's no similar trick for reads).  but generally dma is used to
move large chunks where performance matters.

regardless of dma, larger data sizes *do* help.  like any other network
protocol, there's a header and whatnot.  the minimum tlp for a write is 4 bytes.
ignoring other overhead, that's 25% data for 4-byte integers and ~6% data
for byte writes.  since pcie-3 is 128/130 encoded, the minimum is now
4 bytes.  (quiz: why could this make keeping the plls synced difficult?)

all the 10gbe vendors crank it up to 11 and use 4kb transfers when possible.
all that i've seen can't hit their theoretical maximum frame rate with 60-byte
frames.  too much overhead.

then there's the latency.  in the kernel i use, i keep the cumulative time
spend in irq handlers.  this is useful to see if changes help or hurt irq
latency.  in one case, i found that going from 1 to 2 pcie 4-byte register
reads doubled the time in that irq handler.

- erik

Re: [9fans] integer width on AMD64 (was: Re: AMD64 system)

Reply via email to