> Subject: Re: [Patch v2] genirq/matrix: Choose CPU for assigning interrupts > based on allocated IRQs > > Long, > > On Thu, 1 Nov 2018, Long Li wrote: > > On a large system with multiple devices of the same class (e.g. NVMe > > disks, using managed IRQs), the kernel tends to concentrate their IRQs > > on several CPUs. > > > > The issue is that when NVMe calls irq_matrix_alloc_managed(), the > > assigned CPU tends to be the first several CPUs in the cpumask, > > because they check for > > cpumap->available that will not change after managed IRQs are reserved. > > > > In irq_matrix->cpumap, "available" is set when IRQs are allocated > > earlier in the IRQ allocation process. This value is caculated based > > on > > calculated > > > 1. how many unmanaged IRQs are allocated on this CPU 2. how many > > managed IRQs are reserved on this CPU > > > > But "available" is not accurate in accouting the real IRQs load on a given > > CPU. > > > > For a managed IRQ, it tends to reserve more than one CPU, based on > > cpumask in irq_matrix_reserve_managed. But later when actually > > allocating CPU for this IRQ, only one CPU is allocated. Because > > "available" is calculated at the time managed IRQ is reserved, it > > tends to indicate a CPU has more IRQs than it's actually assigned. > > > > When a managed IRQ is assigned to a CPU in irq_matrix_alloc_managed(), > > it decreases "allocated" based on the actually assignment of this IRQ to > > this > CPU. > > decreases? > > > Unmanaged IRQ also decreases "allocated" after allocating an IRQ on this > CPU. > > ditto > > > For this reason, checking "allocated" is more accurate than checking > > "available" for a given CPU, and result in a more evenly distributed > > IRQ across all CPUs. > > Again, this approach is only correct for managed interrupts. Why? > > Assume that total vector space size = 10 > > CPU 0: > allocated = 8 > available = 1 > > i.e. there are 2 managed reserved, but not assigned interrupts > > CPU 1: > allocated = 7 > available = 0 > > i.e. there are 3 managed reserved, but not assigned interrupts > > Now allocate a non managed interrupt: > > irq_matrix_alloc() > > cpu = find_best_cpu() <-- returns CPU1 > > ---> FAIL > > The allocation fails because it cannot allocate from the managed reserved > space. The managed reserved space is guaranteed even if the vectors are not > assigned. This is required to make hotplug work and to allow late activation > without breaking the guarantees. > > Non managed has no guarantees, it's a best effort approach, so it can fail. > But the fail above is just wrong. > > You really need to treat managed and unmanaged CPU selection differently.
Thank you for the explanation. I will send another patch to do it properly. Long > > Thanks, > > tglx