When doing some low-level cache hit rates measurement I noticed that on Skylake (Xeon Gold 6126) the LLC hit rates are much worse than on previous generations of Xeons.
Both servers were configured in the same way 2x CPU 2x X710 card - one for each NUMA node RSS enabled - 10 queues All interrupts pinned to a dedicated core, NUMA local My application consumes packets from the local card only on cores 2-11 (so also NUMA local, memory allocation enforced with numactl, application pinned to cores, one thread per core) I'm running with CPU isolation, moved everything that could be moved from my IRQ/SoftNet/Application cores 512 descriptors (ethtool -g) on Haswell and 256 on Skylake. Both servers have "Hardware Prefetcher" and "Adjacent Cache Line Prefetch" disabled. In a nutshell core 0 - "everything" core 1 - interrupts for 10 queues core 2 - 11 - 10 threads of my application The ratio of LLC-load-misses / LLC-loads was like 0.5% on Haswell and it's now 5-10% on the core 1 (interrupts and softnet) and above 12% on application cores. It was even worse - I had to change the number of descriptors from 512 (Haswell) to 256 (Skylake). Q1: I'm not sure what is going on here - or maybe I am misinterpreting results? Cache aliasing? Q2: BTW what's the effect of the following settings on DDIO? - LLC Prefetch - XPT Prefetch - LLC dead line allocation - Stale AtoS - Sub NUMA clustering (I think I should keep it disabled) Q3: Will DDIO work for non-Intel cards? How about RAID controllers, etc? -- Michal Purzynski Long time Zeek and Suricata on Intel's commodity hardware advocate _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel Ethernet, visit https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet