Re: [E1000-devel] Low cache hit rates with DDIO and Skylake

Michał Purzyński Thu, 24 Oct 2019 14:10:50 -0700

Yes, I'm familiar with the Skylake architecture changes. Unfortunately, all
of my questions here still hold :)




On Thu, Oct 24, 2019 at 11:10 AM Damjan Marion <[email protected]> wrote:

>
>
> On 24 Oct 2019, at 11:16, Michał Purzyński <[email protected]>
> wrote:
>
> When doing some low-level cache hit rates measurement I noticed that on
> Skylake (Xeon Gold 6126) the LLC hit rates are much worse than on previous
> generations of Xeons.
>
> Both servers were configured in the same way
>
> 2x CPU
> 2x X710 card - one for each NUMA node
> RSS enabled - 10 queues
> All interrupts pinned to a dedicated core, NUMA local
> My application consumes packets from the local card only on cores 2-11 (so
> also NUMA local, memory allocation enforced with numactl, application
> pinned to cores, one thread per core)
> I'm running with CPU isolation, moved everything that could be moved from
> my IRQ/SoftNet/Application cores
>
> 512 descriptors (ethtool -g) on Haswell and 256 on Skylake.
>
> Both servers have "Hardware Prefetcher" and "Adjacent Cache Line Prefetch"
> disabled.
>
> In a nutshell
>
> core 0 - "everything"
> core 1 - interrupts for 10 queues
> core 2 - 11 - 10 threads of my application
>
> The ratio of LLC-load-misses / LLC-loads was like 0.5% on Haswell and it's
> now 5-10% on the core 1 (interrupts and softnet) and above 12% on
> application cores.
>
> It was even worse - I had to change the number of descriptors from 512
> (Haswell) to 256 (Skylake).
>
> Q1: I'm not sure what is going on here - or maybe I am misinterpreting
> results? Cache aliasing?
>
>
> Going trough this page may give you somer answers or at least better
> understanding what’s going on:
>
> https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)
>
> Few key points:
>  - on Skylake Server L3 cache is reduced to 1.325MB/core from 2.5MB/core
> on haswell/broadwell
>  - on Skylake Server L2 cache is increased from 256KB to 1MB
>  - Architecture is changed from ring to mesh
>
>
>
> Q2: BTW what's the effect of the following settings on DDIO?
> - LLC Prefetch
> - XPT Prefetch
> - LLC dead line allocation
> - Stale AtoS
> - Sub NUMA clustering (I think I should keep it disabled)
>
>
> There is a section explaining SNC on the same wiki page...
>
>
> Q3: Will DDIO work for non-Intel cards? How about RAID controllers, etc?
>
>
> Yes, it should. According to my understanding there is nothing PCI cards
> need to do special to utilise DDIO.
>
> —
> Damjan
>

_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet

Re: [E1000-devel] Low cache hit rates with DDIO and Skylake

Reply via email to