> Subject: Re: [Patch v2] genirq/matrix: Choose CPU for assigning interrupts
> based on allocated IRQs
> 
> Long,
> 
> On Thu, 1 Nov 2018, Long Li wrote:
> > On a large system with multiple devices of the same class (e.g. NVMe
> > disks, using managed IRQs), the kernel tends to concentrate their IRQs
> > on several CPUs.
> >
> > The issue is that when NVMe calls irq_matrix_alloc_managed(), the
> > assigned CPU tends to be the first several CPUs in the cpumask,
> > because they check for
> > cpumap->available that will not change after managed IRQs are reserved.
> >
> > In irq_matrix->cpumap, "available" is set when IRQs are allocated
> > earlier in the IRQ allocation process. This value is caculated based
> > on
> 
> calculated
> 
> > 1. how many unmanaged IRQs are allocated on this CPU 2. how many
> > managed IRQs are reserved on this CPU
> >
> > But "available" is not accurate in accouting the real IRQs load on a given 
> > CPU.
> >
> > For a managed IRQ, it tends to reserve more than one CPU, based on
> > cpumask in irq_matrix_reserve_managed. But later when actually
> > allocating CPU for this IRQ, only one CPU is allocated. Because
> > "available" is calculated at the time managed IRQ is reserved, it
> > tends to indicate a CPU has more IRQs than it's actually assigned.
> >
> > When a managed IRQ is assigned to a CPU in irq_matrix_alloc_managed(),
> > it decreases "allocated" based on the actually assignment of this IRQ to 
> > this
> CPU.
> 
> decreases?
> 
> > Unmanaged IRQ also decreases "allocated" after allocating an IRQ on this
> CPU.
> 
> ditto
> 
> > For this reason, checking "allocated" is more accurate than checking
> > "available" for a given CPU, and result in a more evenly distributed
> > IRQ across all CPUs.
> 
> Again, this approach is only correct for managed interrupts. Why?
> 
> Assume that total vector space size  = 10
> 
> CPU 0:
>        allocated      =  8
>        available      =  1
> 
>        i.e. there are 2 managed reserved, but not assigned interrupts
> 
> CPU 1:
>        allocated      =  7
>        available      =  0
> 
>        i.e. there are 3 managed reserved, but not assigned interrupts
> 
> Now allocate a non managed interrupt:
> 
> irq_matrix_alloc()
> 
>       cpu = find_best_cpu() <-- returns CPU1
> 
>       ---> FAIL
> 
> The allocation fails because it cannot allocate from the managed reserved
> space. The managed reserved space is guaranteed even if the vectors are not
> assigned. This is required to make hotplug work and to allow late activation
> without breaking the guarantees.
> 
> Non managed has no guarantees, it's a best effort approach, so it can fail.
> But the fail above is just wrong.
> 
> You really need to treat managed and unmanaged CPU selection differently.

Thank you for the explanation. I will send another patch to do it properly.

Long

> 
> Thanks,
> 
>       tglx

Reply via email to