From: Eric Dumazet [mailto:eric.duma...@gmail.com] > Sent: 03 July 2015 17:39 > On Fri, 2015-07-03 at 16:18 +0000, David Laight wrote: > > > Even on x86 aligning the ethernet receive data on a 4n+2 > > boundary is likely to give marginally better performance > > than aligning on a 4n boundary. > > You are coming late to the party.
I've been to many parties at many different times.... Going back many years, Sun's original sbus DMA part generated a lot of single sbus transfers for 4n+2 aligned buffers - so it was necessary to do a 'realignment' copy. The later DMA+ (definitely the DMA2) part did sbus burst transfers even when the buffer was 4n+2 aligned. So with the later parts you could correctly align the buffer. > Intel guys decided to change NET_IP_ALIGN to 0 (it was 2 in the past) ... > x86: Align skb w/ start of cacheline on newer core 2/Xeon Arch > > x86 architectures can handle unaligned accesses in hardware, and it has > been shown that unaligned DMA accesses can be expensive on Nehalem > architectures. As such we should overwrite NET_IP_ALIGN to resolve > this issue. My 2 cents: I'd have thought it would depend on the nature of the 'DMA' requests generated by the hardware - so ethernet hardware dependant. The above may be correct for PCI masters - especially those that do paired 16bit accesses for every 32bit word. If the hardware generated cache line aligned PCI bursts I wouldn't have thought it would matter. I doubt it is valid for PCIe transfers - where the ethernet frame will be split into (probably) 128byte TLPs. Even if it starts on a 64n+2 boundary the splits will be on 64 byte boundaries since the first and last 32bit words of the TLP have separate byte enables. So I'd expect to see a cache line RMW for the first and last cache lines - That may, or may not, be slower than the misaligned accesses for the entire frame (1 clock data delay per access?) Of course, modern nics will write 2 bytes of 'crap' before the frame. Rounding up the transfer to the end of a cache line might also help (especially if only a few extra words are needed). David