Hi, I'm getting the same issue from two of our customers using the latest i40e driver(2.10.19.30) The suggested fix from Billy works for me, could you please help to review and upstream it
Thanks, Gerald ================================================================= I am getting a 'general protection fault: 0000 [#1] SMP PTI' on the i40e driver. I am using the Intel SR-IOV CNI (https://github.com/intel/sriov-cni/ and https://github.com/intel/sriov-network-device-plugin) to created and delete pods with SR-IOV VFs attached to containers. I found that if I add a VLAN to the VF (via the CNI) I get the crash on the 'kubectl delete pod', but if I add a VLAN and QOS to the VF (via the CNI), the 'kubectl delete pod' doesn't crash. I haven't been able to reproduce with 'ip link set <iface> vf <vfid> vlan <vlanid>' commands. Details: Running Fedora 29, kernel 5.2.11-100.fc29.x86_64 $ ethtool -i eno1 driver: i40e version: 2.8.20-k firmware-version: 6.80 0x80003d71 18.8.9 expansion-rom-version: bus-info: 0000:01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes Crashed on i40e 2.8.20-k, so downloaded and built 2.9.21, which also crashes. Details below are from 2.9.21. [Sep18 10:35] general protection fault: 0000 [#1] SMP PTI [ +0.000030] CPU: 35 PID: 2783 Comm: sriov Tainted: G OE 5.2.11-100.fc29.x86_64 #1 [ +0.000026] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.9.1 12/04/2018 [ +0.000032] RIP: 0010:i40e_config_vf_promiscuous_mode+0x17a/0x350 [i40e] [ +0.000021] Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 48 83 c6 08 48 39 34 24 75 dd 85 c9 74 77 44 0f b6 64 24 08 45 31 d2 4d 8b 3e 4d 85 ff 74 57 <41> 0f b7 4f 16 66 81 f9 ff 0f 77 43 0f b7 b3 ea 0c 00 00 45 31 c0 [ +0.000047] RSP: 0018:ffffa814c90c78b0 EFLAGS: 00010202 [ +0.000016] RAX: 0000000000000000 RBX: ffff9bcb9639f000 RCX: 0000000000000000 [ +0.000019] RDX: 0000000000000000 RSI: 0000000006000000 RDI: ffff9bcb83f30370 [ +0.000020] RBP: ffff9bcb83f30008 R08: 0000000000000000 R09: 0000000000000023 [ +0.000020] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ +0.000020] R13: 0000000000000000 R14: ffff9bcb9639f338 R15: 207904daf17a68cd [ +0.000019] FS: 00000000006d85d0(0000) GS:ffff9bdbbf840000(0000) knlGS:0000000000000000 [ +0.000022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000017] CR2: 000000c000111000 CR3: 0000002023cf6002 CR4: 00000000003606e0 [ +0.000019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000020] Call Trace: [ +0.000018] ? i40e_ndo_set_vf_port_vlan+0x1c4/0x2c0 [i40e] [ +0.000022] ? do_setlink+0x577/0xe90 [ +0.000018] ? security_sock_rcv_skb+0x2a/0x40 [ +0.000015] ? sk_filter_trim_cap+0x4f/0x210 [ +0.000015] ? netlink_attachskb+0x1bc/0x1d0 [ +0.000014] ? rtnl_setlink+0xdd/0x140 [ +0.000015] ? security_capset+0x50/0x60 [ +0.000013] ? rtnetlink_rcv_msg+0x2b1/0x360 [ +0.000014] ? rtnl_calcit.isra.32+0x110/0x110 [ +0.000014] ? netlink_rcv_skb+0x49/0x110 [ +0.000013] ? netlink_unicast+0x191/0x220 [ +0.000013] ? netlink_sendmsg+0x204/0x3d0 [ +0.000015] ? sock_sendmsg+0x4c/0x50 [ +0.000013] ? __sys_sendto+0xee/0x160 [ +0.000013] ? __sys_bind+0x79/0xf0 [ +0.000033] ? __sys_socket+0x93/0xe0 [ +0.000015] ? __x64_sys_sendto+0x24/0x30 [ +0.000017] ? do_syscall_64+0x5f/0x1a0 [ +0.000015] ? page_fault+0x8/0x30 [ +0.000013] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.000017] Modules linked in: i40iw i40e(OE) veth vxlan ip6_udp_tunnel udp_tunnel xt_statistic xt_nat xt_comment xt_mark nf_conntrack_netlink xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE iavf tun bridge stp llc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vfio_pci vfio_virqfd vfio_iommu_type1 vfio overlay ip_set uio nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi sunrpc ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support dcdbas intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore [ +0.000033] intel_rapl_perf joydev mxm_wmi lpc_ich ipmi_ssif ib_uverbs ib_core mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter pcc_cpufreq xfs libcrc32c mgag200 drm_kms_helper ttm virtio_net net_failover drm crc32c_intel igb failover megaraid_sas dca i2c_algo_bit wmi [last unloaded: i40e] [ +0.005208] ---[ end trace deea73cdd7c0f936 ]--- [ +0.012672] RIP: 0010:i40e_config_vf_promiscuous_mode+0x17a/0x350 [i40e] [ +0.000660] Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 48 83 c6 08 48 39 34 24 75 dd 85 c9 74 77 44 0f b6 64 24 08 45 31 d2 4d 8b 3e 4d 85 ff 74 57 <41> 0f b7 4f 16 66 81 f9 ff 0f 77 43 0f b7 b3 ea 0c 00 00 45 31 c0 [ +0.001287] RSP: 0018:ffffa814c90c78b0 EFLAGS: 00010202 [ +0.000778] RAX: 0000000000000000 RBX: ffff9bcb9639f000 RCX: 0000000000000000 [ +0.000688] RDX: 0000000000000000 RSI: 0000000006000000 RDI: ffff9bcb83f30370 [ +0.000657] RBP: ffff9bcb83f30008 R08: 0000000000000000 R09: 0000000000000023 [ +0.000676] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ +0.000695] R13: 0000000000000000 R14: ffff9bcb9639f338 R15: 207904daf17a68cd [ +0.000669] FS: 00000000006d85d0(0000) GS:ffff9bdbbf840000(0000) knlGS:0000000000000000 [ +0.000658] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000669] CR2: 000000c000111000 CR3: 0000002023cf6002 CR4: 00000000003606e0 [ +0.000665] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000674] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000670] Kernel panic - not syncing: Fatal exception Using: objdump -S i40e_virtchnl_pf.o > i40e_virtchnl_pf.txt Crash is located in i40e_config_vf_promiscuous_mode() in i40e_virtchnl_pf.c: static i40e_status i40e_config_vf_promiscuous_mode(struct i40e_vf *vf, u16 vsi_id, bool allmulti, bool alluni) { : if (vf->port_vlan_id) { : } else if (i40e_getnum_vf_vsi_vlan_filters(vsi)) { 1b5: 85 c9 test %ecx,%ecx 1b7: 74 77 je 230 <i40e_config_vf_promiscuous_mode+0x1e0> aq_ret = i40e_aq_set_vsi_mc_promisc_on_vlan(hw, 1b9: 44 0f b6 64 24 08 movzbl 0x8(%rsp),%r12d i40e_status aq_ret = I40E_SUCCESS; 1bf: 45 31 d2 xor %r10d,%r10d hash_for_each(vsi->mac_filter_hash, bkt, f, hlist) { 1c2: 4d 8b 3e mov (%r14),%r15 1c5: 4d 85 ff test %r15,%r15 1c8: 74 57 je 221 <i40e_config_vf_promiscuous_mode+0x1d1> if (f->vlan < 0 || f->vlan > I40E_MAX_VLANID) --> 1ca: 41 0f b7 4f 16 movzwl 0x16(%r15),%ecx 1cf: 66 81 f9 ff 0f cmp $0xfff,%cx 1d4: 77 43 ja 219 <i40e_config_vf_promiscuous_mode+0x1c9> aq_ret = i40e_aq_set_vsi_mc_promisc_on_vlan(hw, 1d6: 0f b7 b3 ea 0c 00 00 movzwl 0xcea(%rbx),%esi 1dd: 45 31 c0 xor %r8d,%r8d 1e0: 44 89 e2 mov %r12d,%edx 1e3: 48 89 ef mov %rbp,%rdi 1e6: e8 00 00 00 00 callq 1eb <i40e_config_vf_promiscuous_mode+0x19b> if (aq_ret) { 1eb: 85 c0 test %eax,%eax 1ed: 0f 85 87 00 00 00 jne 27a <i40e_config_vf_promiscuous_mode+0x22a> FYI - A put a lot of debug prints to try to narrow down, and if I print all the bkts before hash_for_each(), the problem goes away. So looks like a race condition or multiple threads accessing the same data. I removed the debug and added a spin_lock_bh(&vsi->mac_filter_hash_lock); and spin_unlock_bh(&vsi->mac_filter_hash_lock); and change the hash_for_each() to a hash_for_each_safe() and the crash also went away. Let me know what additional data you need. Thanks, Billy McFall _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel Ethernet, visit https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet