> -----Original Message----- > From: Intel-wired-lan <[email protected]> On Behalf Of > Corinna Vinschen > Sent: Friday, April 17, 2026 12:28 AM > To: [email protected]; [email protected] > Cc: Vinschen, Corinna <[email protected]> > Subject: [Intel-wired-lan] [PATCH net v2] ixgbe: only access vfinfo and > mv_list under RCU lock > > Commit 1e53834ce541d ("ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero") added a spinlock to the adapter info. The reason at the time was an observed crash when ixgbe_disable_sriov() freed the adapter->vfinfo array while the interrupt driven function ixgbe_msg_task() was handling VF messages. > > Recent stability testing turned up another crash, which is very easily reproducible: > > while true > do > for numvfs in 5 0 > do > echo $numvfs > /sys/class/net/eth0/device/sriov_numvfs > done > done > > This crashed almost always within the first two hundred runs with a NULL pointer deref while running the ixgbe_service_task() workqueue: > > [ 5052.036491] BUG: kernel NULL pointer dereference, address: 0000000000000258 > [ 5052.043454] #PF: supervisor read access in kernel mode > [ 5052.048594] #PF: error_code(0x0000) - not-present page > [ 5052.053734] PGD 0 P4D 0 > [ 5052.056272] Oops: Oops: 0000 #1 SMP NOPTI > [ 5052.060459] CPU: 2 UID: 0 PID: 132253 Comm: kworker/u96:0 Kdump: loaded > Not tainted 6.12.0-180.el10.x86_64 #1 PREEMPT(voluntary) > [ 5052.072100] Hardware name: Dell Inc. PowerEdge R740/0DY2X0, BIOS 2.12.2 > 07/09/2021 > [ 5052.079664] Workqueue: ixgbe ixgbe_service_task [ixgbe] > [ 5052.084907] RIP: 0010:ixgbe_update_stats+0x8b1/0xb40 [ixgbe] > [ 5052.090585] Code: 21 56 50 49 8b b6 18 26 00 00 4c 01 fe 48 09 46 50 42 8d > 34 a5 00 83 00 00 e8 cb 7a ff ff 49 8b b6 18 26 00 00 89 c0 4c 01 fe <48> 3b > 86 88 00 00 00 73 18 48 b9 00 00 00 00 01 00 00 00 48 01 4e > [ 5052.109331] RSP: 0018:ffffd5f1e8a6bd88 EFLAGS: 00010202 > [ 5052.114558] RAX: 0000000000000000 RBX: ffff8f49b22b14a0 RCX: > 000000000000023c > [ 5052.121689] RDX: ffffffff00000000 RSI: 00000000000001d0 RDI: > ffff8f49b22b14a0 > [ 5052.128823] RBP: 000000000000109c R08: 0000000000000000 R09: > 0000000000000000 > [ 5052.135955] R10: 0000000000000000 R11: 0000000000000000 R12: > 0000000000000002 > [ 5052.143086] R13: 0000000000008410 R14: ffff8f49b22b01a0 R15: > 00000000000001d0 > [ 5052.150221] FS: 0000000000000000(0000) GS:ffff8f58bfc80000(0000) > knlGS:0000000000000000 > [ 5052.158307] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 5052.164054] CR2: 0000000000000258 CR3: 0000000bf2624006 CR4: > 00000000007726f0 > [ 5052.171187] PKRU: 55555554 > [ 5052.173898] Call Trace: > [ 5052.176351] <TASK> > [ 5052.178457] ? show_trace_log_lvl+0x1b0/0x2f0 > [ 5052.182816] ? show_trace_log_lvl+0x1b0/0x2f0 > [ 5052.187177] ? ixgbe_watchdog_subtask+0x1a1/0x230 [ixgbe] > [ 5052.192591] ? __die_body.cold+0x8/0x12 > [ 5052.196433] ? page_fault_oops+0x148/0x160 > [ 5052.200532] ? exc_page_fault+0x7f/0x150 > [ 5052.204458] ? asm_exc_page_fault+0x26/0x30 > [ 5052.208643] ? ixgbe_update_stats+0x8b1/0xb40 [ixgbe] > [ 5052.213714] ? ixgbe_update_stats+0x8a5/0xb40 [ixgbe] > [ 5052.218784] ixgbe_watchdog_subtask+0x1a1/0x230 [ixgbe] > [ 5052.224026] ixgbe_service_task+0x15a/0x3f0 [ixgbe] > [ 5052.228916] process_one_work+0x177/0x330 > [ 5052.232928] worker_thread+0x256/0x3a0 > [ 5052.236681] ? __pfx_worker_thread+0x10/0x10 > [ 5052.240952] kthread+0xfa/0x240 > [ 5052.244099] ? __pfx_kthread+0x10/0x10 > [ 5052.247852] ret_from_fork+0x34/0x50 > [ 5052.251429] ? __pfx_kthread+0x10/0x10 > [ 5052.255185] ret_from_fork_asm+0x1a/0x30 > [ 5052.259112] </TASK> > > The first simple patch, just adding spinlocking to ixgbe_update_stats() while reading from adapter->vfinfo, did not fix the problem, it just moved it elsewhere: I could now reproduce the same kind of crash in ixgbe_restore_vf_multicasts(). > > But adding more spinlocking doesn't really cut it. One reason is that ixgbe_restore_vf_multicasts() is called from within ixgbe_msg_task() with active spinlock, as well as from outside without locking. > > Additionally, given that ixgbe_disable_sriov() is the only call changing adapter->vfinfo, and given ixgbe_disable_sriov() is called very seldom compared to other actions in the driver, just adding more spinlocks would unnecessarily occupy the driver with spinning when multiple functions accessing adapter->vfinfo are running in parallel. > > So this patch drops the spinlock in favor of RCU and uses it throughout the driver. > > While changing this, it seems prudent to do the same for the adapter->mv_list array, which is allocated and freed at the same time as adapter->vfinfo, albeit there was no crash observed. > > Fixes: 1e53834ce541d ("ixgbe: Add locking to prevent panic when setting > sriov_numvfs to zero") > Signed-off-by: Corinna Vinschen <[email protected]> > --- > v2: always return 0 from ixgbe_ndo_get_vf_stats so as not to break > 'ip link show dev' > > Interdiff against v1: > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > index 6ee8c2a140c2..e0a986f1c96a 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > @@ -9797,7 +9797,7 @@ static int ixgbe_ndo_get_vf_stats(struct net_device > *netdev, int vf, > } > rcu_read_unlock(); > > - return vfinfo ? 0 : -EINVAL; > + return 0; > } > > #ifdef CONFIG_IXGBE_DCB > > drivers/net/ethernet/intel/ixgbe/ixgbe.h | 7 +- > .../net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c | 36 +- > .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 44 +- > .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 17 +- > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 227 +++++--- > .../net/ethernet/intel/ixgbe/ixgbe_sriov.c | 547 ++++++++++++------ > 6 files changed, 592 insertions(+), 286 deletions(-)
Tested-by: Alexander Nowlin <[email protected]>
