> -----Original Message----- > From: Intel-wired-lan <[email protected]> On Behalf Of Jesse > Brandeburg > Sent: Wednesday, March 6, 2024 4:32 AM > To: [email protected]; [email protected] > Cc: [email protected]; Kitszel, Przemyslaw > <[email protected]>; Eric Dumazet <[email protected]>; Nguyen, > Anthony L <[email protected]>; [email protected]; Michal Swiatkowski > <[email protected]>; Jakub Kicinski <[email protected]>; Paolo > Abeni <[email protected]>; David S. Miller <[email protected]>; Elliott, > Rob <[email protected]> > Subject: [Intel-wired-lan] [PATCH iwl-net v2] ice: fix memory corruption bug > with suspend and rebuild > > The ice driver would previously panic after suspend. This is caused from the > driver *only* calling the ice_vsi_free_q_vectors() function by itself, when > it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent NULL pointer deref > during reload") the driver has zeroed out num_q_vectors, and only restored it > in ice_vsi_cfg_def(). > > This further causes the ice_rebuild() function to allocate a zero length > buffer, after which num_q_vectors is updated, and then the new value of > num_q_vectors is used to index into the zero length buffer, which corrupts > memory. > > The fix entails making sure all the code referencing num_q_vectors only does > so after it has been reset via ice_vsi_cfg_def(). > > I didn't perform a full bisect, but I was able to test against 6.1.77 kernel > and that ice driver works fine for suspend/resume with no panic, so sometime > since then, this problem was introduced. > > Also clean up an un-needed init of a local variable in the function being > modified. > > PANIC from 6.8.0-rc1: > > [1026674.915596] PM: suspend exit > [1026675.664697] ice 0000:17:00.1: PTP reset successful [1026675.664707] ice > 0000:17:00.1: 2755 msecs passed between update to cached PHC time > [1026675.667660] ice 0000:b1:00.0: PTP reset successful [1026675.675944] ice > 0000:b1:00.0: 2832 msecs passed between update to cached PHC time > [1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow > Control: None [1026677.190201] BUG: kernel NULL pointer dereference, address: > 0000000000000010 [1026677.192753] ice 0000:17:00.0: PTP reset successful > [1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached > PHC time [1026677.197928] #PF: supervisor read access in kernel mode > [1026677.197933] #PF: error_code(0x0000) - not-present page [1026677.197937] > PGD 1557a7067 P4D 0 [1026677.212133] ice 0000:b1:00.1: PTP reset successful > [1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached > PHC time [1026677.212575] > [1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI > [1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: > G W 6.8.0-rc1+ #1 > [1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, > BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022 [1026677.269367] Workqueue: > ice ice_service_task [ice] [1026677.274592] RIP: > 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.281421] Code: 0f > 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec > fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 > 8b 3c e8 66 89 b7 aa 02 00 00 81 e6 [1026677.300877] RSP: > 0018:ff3be62a6399bcc0 EFLAGS: 00010202 [1026677.306556] RAX: ff28691e28980828 > RBX: ff28691e41099828 RCX: 0000000000188000 [1026677.314148] RDX: > 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828 [1026677.321730] > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > [1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: > 0000000000000010 [1026677.336896] R13: 0000000000000000 R14: 0000000000000000 > R15: ff28691e0eaa81a0 [1026677.344472] FS: 0000000000000000(0000) > GS:ff28693cbffc0000(0000) knlGS:0000000000000000 [1026677.353000] CS: 0010 > DS: 0000 ES: 0000 CR0: 0000000080050033 [1026677.359195] CR2: > 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0 [1026677.366779] > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 [1026677.381952] PKRU: 55555554 [1026677.385116] Call Trace: > [1026677.388023] <TASK> > [1026677.390589] ? __die+0x20/0x70 > [1026677.394105] ? page_fault_oops+0x82/0x160 [1026677.398576] ? > do_user_addr_fault+0x65/0x6a0 [1026677.403307] ? exc_page_fault+0x6a/0x150 > [1026677.407694] ? asm_exc_page_fault+0x22/0x30 [1026677.412349] ? > ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.418614] > ice_vsi_rebuild+0x34b/0x3c0 [ice] [1026677.423583] > ice_vsi_rebuild_by_type+0x76/0x180 [ice] [1026677.429147] > ice_rebuild+0x18b/0x520 [ice] [1026677.433746] ? delay_tsc+0x8f/0xc0 > [1026677.437630] ice_do_reset+0xa3/0x190 [ice] [1026677.442231] > ice_service_task+0x26/0x440 [ice] [1026677.447180] > process_one_work+0x174/0x340 [1026677.451669] worker_thread+0x27e/0x390 > [1026677.455890] ? __pfx_worker_thread+0x10/0x10 [1026677.460627] > kthread+0xee/0x120 [1026677.464235] ? __pfx_kthread+0x10/0x10 > [1026677.468445] ret_from_fork+0x2d/0x50 [1026677.472476] ? > __pfx_kthread+0x10/0x10 [1026677.476671] ret_from_fork_asm+0x1b/0x30 > [1026677.481050] </TASK> > > Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") > Reported-by: Robert Elliott <[email protected]> > Signed-off-by: Jesse Brandeburg <[email protected]> > --- > v2: fix uninitialized coalesce pointer on the exit path by moving the kfree > to the later goto (simon), reword commit message (paul) > --- > drivers/net/ethernet/intel/ice/ice_lib.c | 18 +++++++++--------- > 1 file changed, 9 insertions(+), 9 deletions(-) >
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
