RE: [LSF/MM TOPIC] irq affinity handling for high CPU count machines

Elliott, Robert (Persistent Memory) Mon, 29 Jan 2018 07:42:10 -0800


> -----Original Message-----
> From: Linux-nvme [mailto:[email protected]] On Behalf
> Of Hannes Reinecke
> Sent: Monday, January 29, 2018 3:09 AM
> To: [email protected]
> Cc: [email protected]; [email protected]; Kashyap
> Desai <[email protected]>
> Subject: [LSF/MM TOPIC] irq affinity handling for high CPU count machines
> 
> Hi all,
> 
> here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
> mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
> 
> When doing I/O tests on a machine with more CPUs than MSIx vectors
> provided by the HBA we can easily setup a scenario where one CPU is
> submitting I/O and the other one is completing I/O. Which will result in
> the latter CPU being stuck in the interrupt completion routine for
> basically ever, resulting in the lockup detector kicking in.
> 
> How should these situations be handled?
> Should it be made the responsibility of the drivers, ensuring that the
> interrupt completion routine is terminated after a certain time?
> Should it be made the resposibility of the upper layers?
> Should it be the responsibility of the interrupt mapping code?
> Can/should interrupt polling be used in these situations?


Back when we introduced scsi-mq with hpsa, the best approach was to
route interrupts and completion handling so each CPU core handles its
own submissions; this way, they are self-throttling.

Every other arrangement was subject to soft lockups and other problems
when the completion CPUs become overwhelmed with work.

See https://lkml.org/lkml/2014/9/9/931.

---
Robert Elliott, HPE Persistent Memory

RE: [LSF/MM TOPIC] irq affinity handling for high CPU count machines

Reply via email to