I am getting a 'general protection fault: 0000 [#1] SMP PTI' on the i40e
driver. I am using the Intel SR-IOV CNI (https://github.com/intel/sriov-cni/
and https://github.com/intel/sriov-network-device-plugin) to created and
delete pods with SR-IOV VFs attached to containers. I found that if I add a
VLAN to the VF (via the CNI) I get the crash on the 'kubectl delete pod',
but if I add a VLAN and QOS to the VF (via the CNI), the 'kubectl delete
pod' doesn't crash. I haven't been able to reproduce with 'ip link set
<iface> vf <vfid> vlan <vlanid>' commands.

Details:
Running Fedora 29, kernel 5.2.11-100.fc29.x86_64

$ ethtool -i eno1
driver: i40e
version: 2.8.20-k
firmware-version: 6.80 0x80003d71 18.8.9
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Crashed on i40e 2.8.20-k, so downloaded and built 2.9.21, which also
crashes. Details below are from 2.9.21.

[Sep18 10:35] general protection fault: 0000 [#1] SMP PTI
[  +0.000030] CPU: 35 PID: 2783 Comm: sriov Tainted: G           OE
5.2.11-100.fc29.x86_64 #1
[  +0.000026] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.9.1
12/04/2018
[  +0.000032] RIP: 0010:i40e_config_vf_promiscuous_mode+0x17a/0x350 [i40e]
[  +0.000021] Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 48 83 c6 08 48 39 34
24 75 dd 85 c9 74 77 44 0f b6 64 24 08 45 31 d2 4d 8b 3e 4d 85 ff 74 57
<41> 0f b7 4f 16 66 81 f9 ff 0f 77 43 0f b7 b3 ea 0c 00 00 45 31 c0
[  +0.000047] RSP: 0018:ffffa814c90c78b0 EFLAGS: 00010202
[  +0.000016] RAX: 0000000000000000 RBX: ffff9bcb9639f000 RCX:
0000000000000000
[  +0.000019] RDX: 0000000000000000 RSI: 0000000006000000 RDI:
ffff9bcb83f30370
[  +0.000020] RBP: ffff9bcb83f30008 R08: 0000000000000000 R09:
0000000000000023
[  +0.000020] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[  +0.000020] R13: 0000000000000000 R14: ffff9bcb9639f338 R15:
207904daf17a68cd
[  +0.000019] FS:  00000000006d85d0(0000) GS:ffff9bdbbf840000(0000)
knlGS:0000000000000000
[  +0.000022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000017] CR2: 000000c000111000 CR3: 0000002023cf6002 CR4:
00000000003606e0
[  +0.000019] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  +0.000020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000020] Call Trace:
[  +0.000018]  ? i40e_ndo_set_vf_port_vlan+0x1c4/0x2c0 [i40e]
[  +0.000022]  ? do_setlink+0x577/0xe90
[  +0.000018]  ? security_sock_rcv_skb+0x2a/0x40
[  +0.000015]  ? sk_filter_trim_cap+0x4f/0x210
[  +0.000015]  ? netlink_attachskb+0x1bc/0x1d0
[  +0.000014]  ? rtnl_setlink+0xdd/0x140
[  +0.000015]  ? security_capset+0x50/0x60
[  +0.000013]  ? rtnetlink_rcv_msg+0x2b1/0x360
[  +0.000014]  ? rtnl_calcit.isra.32+0x110/0x110
[  +0.000014]  ? netlink_rcv_skb+0x49/0x110
[  +0.000013]  ? netlink_unicast+0x191/0x220
[  +0.000013]  ? netlink_sendmsg+0x204/0x3d0
[  +0.000015]  ? sock_sendmsg+0x4c/0x50
[  +0.000013]  ? __sys_sendto+0xee/0x160
[  +0.000013]  ? __sys_bind+0x79/0xf0
[  +0.000033]  ? __sys_socket+0x93/0xe0
[  +0.000015]  ? __x64_sys_sendto+0x24/0x30
[  +0.000017]  ? do_syscall_64+0x5f/0x1a0
[  +0.000015]  ? page_fault+0x8/0x30
[  +0.000013]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0.000017] Modules linked in: i40iw i40e(OE) veth vxlan ip6_udp_tunnel
udp_tunnel xt_statistic xt_nat xt_comment xt_mark nf_conntrack_netlink
xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE iavf tun bridge stp llc
ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4
xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vfio_pci
vfio_virqfd vfio_iommu_type1 vfio overlay ip_set uio nfnetlink
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi
sunrpc ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm
ib_umad rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support dcdbas intel_rapl
sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate
intel_uncore
[  +0.000033]  intel_rapl_perf joydev mxm_wmi lpc_ich ipmi_ssif ib_uverbs
ib_core mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter
pcc_cpufreq xfs libcrc32c mgag200 drm_kms_helper ttm virtio_net
net_failover drm crc32c_intel igb failover megaraid_sas dca i2c_algo_bit
wmi [last unloaded: i40e]
[  +0.005208] ---[ end trace deea73cdd7c0f936 ]---
[  +0.012672] RIP: 0010:i40e_config_vf_promiscuous_mode+0x17a/0x350 [i40e]
[  +0.000660] Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 48 83 c6 08 48 39 34
24 75 dd 85 c9 74 77 44 0f b6 64 24 08 45 31 d2 4d 8b 3e 4d 85 ff 74 57
<41> 0f b7 4f 16 66 81 f9 ff 0f 77 43 0f b7 b3 ea 0c 00 00 45 31 c0
[  +0.001287] RSP: 0018:ffffa814c90c78b0 EFLAGS: 00010202
[  +0.000778] RAX: 0000000000000000 RBX: ffff9bcb9639f000 RCX:
0000000000000000
[  +0.000688] RDX: 0000000000000000 RSI: 0000000006000000 RDI:
ffff9bcb83f30370
[  +0.000657] RBP: ffff9bcb83f30008 R08: 0000000000000000 R09:
0000000000000023
[  +0.000676] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[  +0.000695] R13: 0000000000000000 R14: ffff9bcb9639f338 R15:
207904daf17a68cd
[  +0.000669] FS:  00000000006d85d0(0000) GS:ffff9bdbbf840000(0000)
knlGS:0000000000000000
[  +0.000658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000669] CR2: 000000c000111000 CR3: 0000002023cf6002 CR4:
00000000003606e0
[  +0.000665] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  +0.000674] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000670] Kernel panic - not syncing: Fatal exception

Using:
  objdump -S i40e_virtchnl_pf.o > i40e_virtchnl_pf.txt

Crash is located in i40e_config_vf_promiscuous_mode() in i40e_virtchnl_pf.c:

 static i40e_status i40e_config_vf_promiscuous_mode(struct i40e_vf *vf,
  u16 vsi_id,
  bool allmulti,
  bool alluni)
{
:
if (vf->port_vlan_id) {
:
} else if (i40e_getnum_vf_vsi_vlan_filters(vsi)) {
     1b5: 85 c9                 test   %ecx,%ecx
     1b7: 74 77                 je     230
<i40e_config_vf_promiscuous_mode+0x1e0>
aq_ret = i40e_aq_set_vsi_mc_promisc_on_vlan(hw,
     1b9: 44 0f b6 64 24 08     movzbl 0x8(%rsp),%r12d
i40e_status aq_ret = I40E_SUCCESS;
     1bf: 45 31 d2             xor    %r10d,%r10d
hash_for_each(vsi->mac_filter_hash, bkt, f, hlist) {
     1c2: 4d 8b 3e             mov    (%r14),%r15
     1c5: 4d 85 ff             test   %r15,%r15
     1c8: 74 57                 je     221
<i40e_config_vf_promiscuous_mode+0x1d1>
if (f->vlan < 0 || f->vlan > I40E_MAX_VLANID)
-->  1ca: 41 0f b7 4f 16       movzwl 0x16(%r15),%ecx
     1cf: 66 81 f9 ff 0f       cmp    $0xfff,%cx
     1d4: 77 43                 ja     219
<i40e_config_vf_promiscuous_mode+0x1c9>
aq_ret = i40e_aq_set_vsi_mc_promisc_on_vlan(hw,
     1d6: 0f b7 b3 ea 0c 00 00 movzwl 0xcea(%rbx),%esi
     1dd: 45 31 c0             xor    %r8d,%r8d
     1e0: 44 89 e2             mov    %r12d,%edx
     1e3: 48 89 ef             mov    %rbp,%rdi
     1e6: e8 00 00 00 00       callq  1eb
<i40e_config_vf_promiscuous_mode+0x19b>
if (aq_ret) {
     1eb: 85 c0                 test   %eax,%eax
     1ed: 0f 85 87 00 00 00     jne    27a
<i40e_config_vf_promiscuous_mode+0x22a>

FYI - A put a lot of debug prints to try to narrow down, and if I print all
the bkts before hash_for_each(), the problem goes away. So looks like a
race condition or multiple threads accessing the same data. I removed the
debug and added a spin_lock_bh(&vsi->mac_filter_hash_lock);
and spin_unlock_bh(&vsi->mac_filter_hash_lock); and change the
hash_for_each() to a hash_for_each_safe() and the crash also went away.

Let me know what additional data you need.

Thanks,
Billy McFall

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to