From: Eric Dumazet [mailto:eric.duma...@gmail.com]
> Sent: 03 July 2015 17:39
> On Fri, 2015-07-03 at 16:18 +0000, David Laight wrote:
> 
> > Even on x86 aligning the ethernet receive data on a 4n+2
> > boundary is likely to give marginally better performance
> > than aligning on a 4n boundary.
> 
> You are coming late to the party.

I've been to many parties at many different times....

Going back many years, Sun's original sbus DMA part generated a lot
of single sbus transfers for 4n+2 aligned buffers - so it was necessary
to do a 'realignment' copy. The later DMA+ (definitely the DMA2) part
did sbus burst transfers even when the buffer was 4n+2 aligned.
So with the later parts you could correctly align the buffer.

> Intel guys decided to change NET_IP_ALIGN to 0 (it was 2 in the past)
...
>     x86: Align skb w/ start of cacheline on newer core 2/Xeon Arch
> 
>     x86 architectures can handle unaligned accesses in hardware, and it has
>     been shown that unaligned DMA accesses can be expensive on Nehalem
>     architectures.  As such we should overwrite NET_IP_ALIGN to resolve
>     this issue.

My 2 cents:

I'd have thought it would depend on the nature of the 'DMA' requests
generated by the hardware - so ethernet hardware dependant.

The above may be correct for PCI masters - especially those that do
paired 16bit accesses for every 32bit word.
If the hardware generated cache line aligned PCI bursts I wouldn't
have thought it would matter.

I doubt it is valid for PCIe transfers - where the ethernet frame
will be split into (probably) 128byte TLPs. Even if it starts on
a 64n+2 boundary the splits will be on 64 byte boundaries since the
first and last 32bit words of the TLP have separate byte enables.

So I'd expect to see a cache line RMW for the first and last cache
lines - That may, or may not, be slower than the misaligned accesses
for the entire frame (1 clock data delay per access?)

Of course, modern nics will write 2 bytes of 'crap' before the frame.
Rounding up the transfer to the end of a cache line might also help
(especially if only a few extra words are needed).

        David

Reply via email to