When doing some low-level cache hit rates measurement I noticed that on
Skylake (Xeon Gold 6126) the LLC hit rates are much worse than on previous
generations of Xeons.

Both servers were configured in the same way

2x CPU
2x X710 card - one for each NUMA node
RSS enabled - 10 queues
All interrupts pinned to a dedicated core, NUMA local
My application consumes packets from the local card only on cores 2-11 (so
also NUMA local, memory allocation enforced with numactl, application
pinned to cores, one thread per core)
I'm running with CPU isolation, moved everything that could be moved from
my IRQ/SoftNet/Application cores

512 descriptors (ethtool -g) on Haswell and 256 on Skylake.

Both servers have "Hardware Prefetcher" and "Adjacent Cache Line Prefetch"
disabled.

In a nutshell

core 0 - "everything"
core 1 - interrupts for 10 queues
core 2 - 11 - 10 threads of my application

The ratio of LLC-load-misses / LLC-loads was like 0.5% on Haswell and it's
now 5-10% on the core 1 (interrupts and softnet) and above 12% on
application cores.

It was even worse - I had to change the number of descriptors from 512
(Haswell) to 256 (Skylake).

Q1: I'm not sure what is going on here - or maybe I am misinterpreting
results? Cache aliasing?

Q2: BTW what's the effect of the following settings on DDIO?
- LLC Prefetch
- XPT Prefetch
- LLC dead line allocation
- Stale AtoS
- Sub NUMA clustering (I think I should keep it disabled)

Q3: Will DDIO work for non-Intel cards? How about RAID controllers, etc?

--
Michal Purzynski
Long time Zeek and Suricata on Intel's commodity hardware advocate

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet

Reply via email to