Yes, I'm familiar with the Skylake architecture changes. Unfortunately, all of my questions here still hold :)
On Thu, Oct 24, 2019 at 11:10 AM Damjan Marion <dmar...@me.com> wrote: > > > On 24 Oct 2019, at 11:16, Michał Purzyński <michalpurzyns...@gmail.com> > wrote: > > When doing some low-level cache hit rates measurement I noticed that on > Skylake (Xeon Gold 6126) the LLC hit rates are much worse than on previous > generations of Xeons. > > Both servers were configured in the same way > > 2x CPU > 2x X710 card - one for each NUMA node > RSS enabled - 10 queues > All interrupts pinned to a dedicated core, NUMA local > My application consumes packets from the local card only on cores 2-11 (so > also NUMA local, memory allocation enforced with numactl, application > pinned to cores, one thread per core) > I'm running with CPU isolation, moved everything that could be moved from > my IRQ/SoftNet/Application cores > > 512 descriptors (ethtool -g) on Haswell and 256 on Skylake. > > Both servers have "Hardware Prefetcher" and "Adjacent Cache Line Prefetch" > disabled. > > In a nutshell > > core 0 - "everything" > core 1 - interrupts for 10 queues > core 2 - 11 - 10 threads of my application > > The ratio of LLC-load-misses / LLC-loads was like 0.5% on Haswell and it's > now 5-10% on the core 1 (interrupts and softnet) and above 12% on > application cores. > > It was even worse - I had to change the number of descriptors from 512 > (Haswell) to 256 (Skylake). > > Q1: I'm not sure what is going on here - or maybe I am misinterpreting > results? Cache aliasing? > > > Going trough this page may give you somer answers or at least better > understanding what’s going on: > > https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server) > > Few key points: > - on Skylake Server L3 cache is reduced to 1.325MB/core from 2.5MB/core > on haswell/broadwell > - on Skylake Server L2 cache is increased from 256KB to 1MB > - Architecture is changed from ring to mesh > > > > Q2: BTW what's the effect of the following settings on DDIO? > - LLC Prefetch > - XPT Prefetch > - LLC dead line allocation > - Stale AtoS > - Sub NUMA clustering (I think I should keep it disabled) > > > There is a section explaining SNC on the same wiki page... > > > Q3: Will DDIO work for non-Intel cards? How about RAID controllers, etc? > > > Yes, it should. According to my understanding there is nothing PCI cards > need to do special to utilise DDIO. > > — > Damjan > _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel Ethernet, visit https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet