I am getting a 'general protection fault: 0000 [#1] SMP PTI' on the i40e driver. I am using the Intel SR-IOV CNI (https://github.com/intel/sriov-cni/ and https://github.com/intel/sriov-network-device-plugin) to created and delete pods with SR-IOV VFs attached to containers. I found that if I add a VLAN to the VF (via the CNI) I get the crash on the 'kubectl delete pod', but if I add a VLAN and QOS to the VF (via the CNI), the 'kubectl delete pod' doesn't crash. I haven't been able to reproduce with 'ip link set <iface> vf <vfid> vlan <vlanid>' commands.
Details: Running Fedora 29, kernel 5.2.11-100.fc29.x86_64 $ ethtool -i eno1 driver: i40e version: 2.8.20-k firmware-version: 6.80 0x80003d71 18.8.9 expansion-rom-version: bus-info: 0000:01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes Crashed on i40e 2.8.20-k, so downloaded and built 2.9.21, which also crashes. Details below are from 2.9.21. [Sep18 10:35] general protection fault: 0000 [#1] SMP PTI [ +0.000030] CPU: 35 PID: 2783 Comm: sriov Tainted: G OE 5.2.11-100.fc29.x86_64 #1 [ +0.000026] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.9.1 12/04/2018 [ +0.000032] RIP: 0010:i40e_config_vf_promiscuous_mode+0x17a/0x350 [i40e] [ +0.000021] Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 48 83 c6 08 48 39 34 24 75 dd 85 c9 74 77 44 0f b6 64 24 08 45 31 d2 4d 8b 3e 4d 85 ff 74 57 <41> 0f b7 4f 16 66 81 f9 ff 0f 77 43 0f b7 b3 ea 0c 00 00 45 31 c0 [ +0.000047] RSP: 0018:ffffa814c90c78b0 EFLAGS: 00010202 [ +0.000016] RAX: 0000000000000000 RBX: ffff9bcb9639f000 RCX: 0000000000000000 [ +0.000019] RDX: 0000000000000000 RSI: 0000000006000000 RDI: ffff9bcb83f30370 [ +0.000020] RBP: ffff9bcb83f30008 R08: 0000000000000000 R09: 0000000000000023 [ +0.000020] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ +0.000020] R13: 0000000000000000 R14: ffff9bcb9639f338 R15: 207904daf17a68cd [ +0.000019] FS: 00000000006d85d0(0000) GS:ffff9bdbbf840000(0000) knlGS:0000000000000000 [ +0.000022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000017] CR2: 000000c000111000 CR3: 0000002023cf6002 CR4: 00000000003606e0 [ +0.000019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000020] Call Trace: [ +0.000018] ? i40e_ndo_set_vf_port_vlan+0x1c4/0x2c0 [i40e] [ +0.000022] ? do_setlink+0x577/0xe90 [ +0.000018] ? security_sock_rcv_skb+0x2a/0x40 [ +0.000015] ? sk_filter_trim_cap+0x4f/0x210 [ +0.000015] ? netlink_attachskb+0x1bc/0x1d0 [ +0.000014] ? rtnl_setlink+0xdd/0x140 [ +0.000015] ? security_capset+0x50/0x60 [ +0.000013] ? rtnetlink_rcv_msg+0x2b1/0x360 [ +0.000014] ? rtnl_calcit.isra.32+0x110/0x110 [ +0.000014] ? netlink_rcv_skb+0x49/0x110 [ +0.000013] ? netlink_unicast+0x191/0x220 [ +0.000013] ? netlink_sendmsg+0x204/0x3d0 [ +0.000015] ? sock_sendmsg+0x4c/0x50 [ +0.000013] ? __sys_sendto+0xee/0x160 [ +0.000013] ? __sys_bind+0x79/0xf0 [ +0.000033] ? __sys_socket+0x93/0xe0 [ +0.000015] ? __x64_sys_sendto+0x24/0x30 [ +0.000017] ? do_syscall_64+0x5f/0x1a0 [ +0.000015] ? page_fault+0x8/0x30 [ +0.000013] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.000017] Modules linked in: i40iw i40e(OE) veth vxlan ip6_udp_tunnel udp_tunnel xt_statistic xt_nat xt_comment xt_mark nf_conntrack_netlink xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE iavf tun bridge stp llc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vfio_pci vfio_virqfd vfio_iommu_type1 vfio overlay ip_set uio nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi sunrpc ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support dcdbas intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore [ +0.000033] intel_rapl_perf joydev mxm_wmi lpc_ich ipmi_ssif ib_uverbs ib_core mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter pcc_cpufreq xfs libcrc32c mgag200 drm_kms_helper ttm virtio_net net_failover drm crc32c_intel igb failover megaraid_sas dca i2c_algo_bit wmi [last unloaded: i40e] [ +0.005208] ---[ end trace deea73cdd7c0f936 ]--- [ +0.012672] RIP: 0010:i40e_config_vf_promiscuous_mode+0x17a/0x350 [i40e] [ +0.000660] Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 48 83 c6 08 48 39 34 24 75 dd 85 c9 74 77 44 0f b6 64 24 08 45 31 d2 4d 8b 3e 4d 85 ff 74 57 <41> 0f b7 4f 16 66 81 f9 ff 0f 77 43 0f b7 b3 ea 0c 00 00 45 31 c0 [ +0.001287] RSP: 0018:ffffa814c90c78b0 EFLAGS: 00010202 [ +0.000778] RAX: 0000000000000000 RBX: ffff9bcb9639f000 RCX: 0000000000000000 [ +0.000688] RDX: 0000000000000000 RSI: 0000000006000000 RDI: ffff9bcb83f30370 [ +0.000657] RBP: ffff9bcb83f30008 R08: 0000000000000000 R09: 0000000000000023 [ +0.000676] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ +0.000695] R13: 0000000000000000 R14: ffff9bcb9639f338 R15: 207904daf17a68cd [ +0.000669] FS: 00000000006d85d0(0000) GS:ffff9bdbbf840000(0000) knlGS:0000000000000000 [ +0.000658] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000669] CR2: 000000c000111000 CR3: 0000002023cf6002 CR4: 00000000003606e0 [ +0.000665] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000674] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000670] Kernel panic - not syncing: Fatal exception Using: objdump -S i40e_virtchnl_pf.o > i40e_virtchnl_pf.txt Crash is located in i40e_config_vf_promiscuous_mode() in i40e_virtchnl_pf.c: static i40e_status i40e_config_vf_promiscuous_mode(struct i40e_vf *vf, u16 vsi_id, bool allmulti, bool alluni) { : if (vf->port_vlan_id) { : } else if (i40e_getnum_vf_vsi_vlan_filters(vsi)) { 1b5: 85 c9 test %ecx,%ecx 1b7: 74 77 je 230 <i40e_config_vf_promiscuous_mode+0x1e0> aq_ret = i40e_aq_set_vsi_mc_promisc_on_vlan(hw, 1b9: 44 0f b6 64 24 08 movzbl 0x8(%rsp),%r12d i40e_status aq_ret = I40E_SUCCESS; 1bf: 45 31 d2 xor %r10d,%r10d hash_for_each(vsi->mac_filter_hash, bkt, f, hlist) { 1c2: 4d 8b 3e mov (%r14),%r15 1c5: 4d 85 ff test %r15,%r15 1c8: 74 57 je 221 <i40e_config_vf_promiscuous_mode+0x1d1> if (f->vlan < 0 || f->vlan > I40E_MAX_VLANID) --> 1ca: 41 0f b7 4f 16 movzwl 0x16(%r15),%ecx 1cf: 66 81 f9 ff 0f cmp $0xfff,%cx 1d4: 77 43 ja 219 <i40e_config_vf_promiscuous_mode+0x1c9> aq_ret = i40e_aq_set_vsi_mc_promisc_on_vlan(hw, 1d6: 0f b7 b3 ea 0c 00 00 movzwl 0xcea(%rbx),%esi 1dd: 45 31 c0 xor %r8d,%r8d 1e0: 44 89 e2 mov %r12d,%edx 1e3: 48 89 ef mov %rbp,%rdi 1e6: e8 00 00 00 00 callq 1eb <i40e_config_vf_promiscuous_mode+0x19b> if (aq_ret) { 1eb: 85 c0 test %eax,%eax 1ed: 0f 85 87 00 00 00 jne 27a <i40e_config_vf_promiscuous_mode+0x22a> FYI - A put a lot of debug prints to try to narrow down, and if I print all the bkts before hash_for_each(), the problem goes away. So looks like a race condition or multiple threads accessing the same data. I removed the debug and added a spin_lock_bh(&vsi->mac_filter_hash_lock); and spin_unlock_bh(&vsi->mac_filter_hash_lock); and change the hash_for_each() to a hash_for_each_safe() and the crash also went away. Let me know what additional data you need. Thanks, Billy McFall _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired