Hi Jens, Keith, Christoph, Sagi, Michael, I have decided to drive this series forward on behalf of Daniel Wagner, the original author. This iteration addresses the outstanding architectural and concurrency concerns raised during the previous review cycle, and the series has been rebased on v7.0-rc5-509-g545475aebc2a.
Building upon prior iterations, this series introduces critical architectural refinements to the mapping and affinity spreading algorithms to guarantee thread safety and resilience against concurrent CPU-hotplug operations. Previously, the block layer relied on a shared global static mask (i.e., blk_hk_online_mask), which proved vulnerable to race conditions during rapid hotplug events. This vulnerability was recently highlighted by the kernel test robot, which encountered a NULL pointer dereference during rcutorture (cpuhotplug) stress testing due to concurrent mask modification. To resolve this, the architecture has been fundamentally hardened. The global static state has been eradicated. Instead, the IRQ affinity core now employs a newly introduced irq_spread_hk_filter(), which safely intersects the natively calculated affinity mask with the HK_TYPE_IO_QUEUE mask. Crucially, this is achieved using a local, hotplug-safe snapshot via data_race(cpu_online_mask). This approach circumvents the hotplug lock deadlocks previously identified by Thomas Gleixner, whilst explicitly avoiding CONFIG_CPUMASK_OFFSTACK stack bloat hazards on high-core-count systems. A robust fallback mechanism guarantees that should an interrupt vector be assigned exclusively to isolated cores, it is safely re-routed to the system's online housekeeping CPUs. Please let me know your thoughts. Changes in v9: - Added "Reviewed-by:" tags - Introduced irq_spread_hk_filter() to safely restrict managed IRQ affinity to housekeeping CPUs (Thomas Gleixner) - Removed the unsafe global static variable blk_hk_online_mask from blk-mq-cpumap.c and blk-mq.c. blk_mq_online_queue_affinity() now returns a stable pointer, delegating safe intersection to the callers to prevent concurrent modification races (Thomas Gleixner, Hannes Reinecke) - Resolved BUG: kernel NULL pointer dereference in __blk_mq_all_tag_iter reported by the kernel test robot during cpuhotplug rcutorture stress testing - Linked to v8: https://lore.kernel.org/lkml/[email protected]/ Changes in v8: - Added commit 524f5eea4bbe ("lib/group_cpus: remove !SMP code") - Merged the new mapping logic directly into the existing function to avoid special casing - Refined the group_mask_cpus_evenly() implementation with the following updates: - Corrected the function name typo (changed group_masks_cpus_evenly to group_mask_cpus_evenly) - Updated the documentation comment to accurately reflect the function's behavior - Renamed the cpu_mask argument to mask for consistency - Added a new patch for aacraid to include the missing number of queues calculation - Restricted updates to only affect SCSI drivers that support PCI_IRQ_AFFINITY and do not utilize nvme-fabrics - Removed the __free cleanup attribute usage for cpumask_var_t allocations due to compatibility issues - Updated the documentation to explicitly highlight the limitations surrounding CPU offlining - Collected accumulated Reviewed-by and Acked-by tags - Linked to v7: https://patch.msgid.link/[email protected] Changes in v7: - Sent out the first part of the series independently: https://lore.kernel.org/all/[email protected]/ - Added comprehensive kernel command-line documentation - Added validation logic to ensure the resulting CPU-to-queue mapping is fully operational - Rewrote the isolcpus mapping code to properly account for active hardware contexts (hctx) - Introduced blk_mq_map_hk_irq_queues, which utilizes the mask retrieved from irq_get_affinity() - Refactored blk_mq_map_hk_queues to require the caller to explicitly test for HK_TYPE_MANAGED_IRQ - Linked to v6: https://patch.msgid.link/[email protected] Changes in v6: - Reintroduced the io_queue type for the isolcpus kernel parameter - Prevented the offlining of a housekeeping CPU if an isolated CPU is still present, upgrading this behavior from a simple warning to a hard restriction - Linked to v5: https://lore.kernel.org/r/[email protected] Changes in v5: - Rebased the series onto the latest for-6.14/block branch. - Updated the documentation regarding the managed_irq parameters - Reworded the commit message for "blk-mq: issue warning when offlining hctx with online isolcpus" for better clarity - Split the input and output parameters in the patch "lib/group_cpus: let group_cpu_evenly return number of groups" - Dropped the patch "sched/isolation: document HK_TYPE housekeeping option" - Linked to v4: https://lore.kernel.org/r/[email protected] Changes in v4: - Added the patch "blk-mq: issue warning when offlining hctx with online isolcpus" - Fixed the check in group_cpus_evenly(); the condition now properly uses housekeeping_enabled() instead of cpumask_weight(), as the latter always returns a valid mask - Dropped the Fixes: tag from "lib/group_cpus.c: honor housekeeping config when grouping CPUs" - Fixed an overlong line warning in the patch "scsi: use block layer helpers to calculate num of queues" - Dropped the patch "sched/isolation: Add io_queue housekeeping option" in favor of simply documenting the housekeeping hk_type enum - Added the patch "lib/group_cpus: let group_cpu_evenly return number of groups" - Collected accumulated Reviewed-by and Acked-by tags - Split the patchset by moving foundational changes into a separate preparation series: https://lore.kernel.org/linux-nvme/20241202-refactor-blk-affinity-helpers-v6-0-27211e9c2...@kernel.org/ - Linked to v3: https://lore.kernel.org/r/[email protected] Changes in v3: - Integrated patches from Ming Lei (https://lore.kernel.org/all/[email protected]/): "virtio: add APIs for retrieving vq affinity" and "blk-mq: introduce blk_mq_dev_map_queues" - Replaced all instances of blk_mq_pci_map_queues and blk_mq_virtio_map_queues with the new unified blk_mq_dev_map_queues - Updated and expanded the helper functions used for calculating the number of queues - Added the CPU-to-hctx mapping function specifically to support the isolcpus=io_queue parameter - Documented the hk_type enum and the newly introduced isolcpus=io_queue parameter - Added the patch "scsi: pm8001: do not overwrite PCI queue mapping" - Linked to v2: https://lore.kernel.org/r/[email protected] Changes in v2: - Updated the feature documentation for clarity and completeness - Split the blk/nvme-pci patch into smaller, logical commits - Dropped the HK_TYPE_IO_QUEUE macro in favor of reusing HK_TYPE_MANAGED_IRQ - Linked to v1: https://lore.kernel.org/r/[email protected] Aaron Tomlin (1): genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs Daniel Wagner (12): scsi: aacraid: use block layer helpers to calculate num of queues lib/group_cpus: remove dead !SMP code lib/group_cpus: Add group_mask_cpus_evenly() genirq/affinity: Add cpumask to struct irq_affinity blk-mq: add blk_mq_{online|possible}_queue_affinity nvme-pci: use block layer helpers to constrain queue affinity scsi: Use block layer helpers to constrain queue affinity virtio: blk/scsi: use block layer helpers to constrain queue affinity isolation: Introduce io_queue isolcpus type blk-mq: use hk cpus only when isolcpus=io_queue is enabled blk-mq: prevent offlining hk CPUs with associated online isolated CPUs docs: add io_queue flag to isolcpus .../admin-guide/kernel-parameters.txt | 22 +- block/blk-mq-cpumap.c | 201 ++++++++++++++++-- block/blk-mq.c | 42 ++++ drivers/block/virtio_blk.c | 4 +- drivers/nvme/host/pci.c | 1 + drivers/scsi/aacraid/comminit.c | 3 +- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 + drivers/scsi/megaraid/megaraid_sas_base.c | 5 +- drivers/scsi/mpi3mr/mpi3mr_fw.c | 6 +- drivers/scsi/mpt3sas/mpt3sas_base.c | 5 +- drivers/scsi/pm8001/pm8001_init.c | 1 + drivers/scsi/virtio_scsi.c | 5 +- include/linux/blk-mq.h | 2 + include/linux/group_cpus.h | 3 + include/linux/interrupt.h | 16 +- include/linux/sched/isolation.h | 1 + kernel/irq/affinity.c | 38 +++- kernel/sched/isolation.c | 7 + lib/group_cpus.c | 65 ++++-- 19 files changed, 379 insertions(+), 49 deletions(-) base-commit: 545475aebc2a2e8df14fadc911a7a2d03ddd6a1f -- 2.51.0

