Dear Marcin,
Thank you for the patch, and the reproducer and detailed commit message. Am 09.10.24 um 14:49 schrieb Marcin Szycik:
Unloading the ice driver while switchdev port representors are added to a bridge can lead to kernel panic. Reproducer: modprobe ice devlink dev eswitch set $PF1_PCI mode switchdev ip link add $BR type bridge ip link set $BR up echo 2 > /sys/class/net/$PF1/device/sriov_numvfs sleep 2 ip link set $PF1 master $BR ip link set $VF1_PR master $BR ip link set $VF2_PR master $BR ip link set $PF1 up ip link set $VF1_PR up ip link set $VF2_PR up ip link set $VF1 up rmmod irdma ice
For people hitting the issue, an excerpt from the panic would also be nice, so it can be found more easily.
When unloading the driver, ice_eswitch_detach() is eventually called as part of VF freeing. First, it removes a port representor from xarray, then unregister_netdev() is called (via repr->ops.rem()), finally representor is deallocated. The problem comes from the bridge doing its own deinit at the same time. unregister_netdev() triggers a notifier chain, resulting in ice_eswitch_br_port_deinit() being called. It should set repr->br_port = NULL, but this does not happen since repr has already been removed from xarray and is not found. Regardless, it finishes up deallocating br_port. At this point, repr is still not freed and an fdb event can happen, in which ice_eswitch_br_fdb_event_work() takes repr->br_port and tries to use it, which causes a panic (use after free). Note that this only happens with 2 or more port representors added to the bridge, since with only one representor port, the bridge deinit is slightly different (ice_eswitch_br_port_deinit() is called via ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()). A workaround is available: brctl setageing $BR 0, which stops the bridge from adding fdb entries altogether. Change the order of operations in ice_eswitch_detach(): move the call to unregister_netdev() before removing repr from xarray. This way repr->br_port will be correctly set to NULL in ice_eswitch_br_port_deinit(), preventing a panic. Fixes: fff292b47ac1 ("ice: add VF representors one by one") Reviewed-by: Michal Swiatkowski <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> --- drivers/net/ethernet/intel/ice/ice_eswitch.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c index c0b3e70a7ea3..fb527434b58b 100644 --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c @@ -552,13 +552,14 @@ int ice_eswitch_attach_sf(struct ice_pf *pf, struct ice_dynamic_port *sf) static void ice_eswitch_detach(struct ice_pf *pf, struct ice_repr *repr) { ice_eswitch_stop_reprs(pf); + repr->ops.rem(repr); + xa_erase(&pf->eswitch.reprs, repr->id);if (xa_empty(&pf->eswitch.reprs))ice_eswitch_disable_switchdev(pf);ice_eswitch_release_repr(pf, repr);- repr->ops.rem(repr); ice_repr_destroy(repr);if (xa_empty(&pf->eswitch.reprs)) {
Reviewed-by: Paul Menzel <[email protected]> Kind regards, Paul
