Hi,

I'm getting the same issue from two of our customers using the latest i40e
driver(2.10.19.30)
The suggested fix from Billy works for me, could you please help to review
and upstream it

Thanks,
Gerald

=================================================================

I am getting a 'general protection fault: 0000 [#1] SMP PTI' on the i40e
driver. I am using the Intel SR-IOV CNI (https://github.com/intel/sriov-cni/
and https://github.com/intel/sriov-network-device-plugin) to created and
delete pods with SR-IOV VFs attached to containers. I found that if I add a
VLAN to the VF (via the CNI) I get the crash on the 'kubectl delete pod',
but if I add a VLAN and QOS to the VF (via the CNI), the 'kubectl delete
pod' doesn't crash. I haven't been able to reproduce with 'ip link set
<iface> vf <vfid> vlan <vlanid>' commands.

Details:
Running Fedora 29, kernel 5.2.11-100.fc29.x86_64

$ ethtool -i eno1
driver: i40e
version: 2.8.20-k
firmware-version: 6.80 0x80003d71 18.8.9
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Crashed on i40e 2.8.20-k, so downloaded and built 2.9.21, which also
crashes. Details below are from 2.9.21.

[Sep18 10:35] general protection fault: 0000 [#1] SMP PTI
[  +0.000030] CPU: 35 PID: 2783 Comm: sriov Tainted: G           OE
5.2.11-100.fc29.x86_64 #1
[  +0.000026] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.9.1
12/04/2018
[  +0.000032] RIP: 0010:i40e_config_vf_promiscuous_mode+0x17a/0x350 [i40e]
[  +0.000021] Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 48 83 c6 08 48 39 34
24 75 dd 85 c9 74 77 44 0f b6 64 24 08 45 31 d2 4d 8b 3e 4d 85 ff 74 57
<41> 0f b7 4f 16 66 81 f9 ff 0f 77 43 0f b7 b3 ea 0c 00 00 45 31 c0
[  +0.000047] RSP: 0018:ffffa814c90c78b0 EFLAGS: 00010202
[  +0.000016] RAX: 0000000000000000 RBX: ffff9bcb9639f000 RCX:
0000000000000000
[  +0.000019] RDX: 0000000000000000 RSI: 0000000006000000 RDI:
ffff9bcb83f30370
[  +0.000020] RBP: ffff9bcb83f30008 R08: 0000000000000000 R09:
0000000000000023
[  +0.000020] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[  +0.000020] R13: 0000000000000000 R14: ffff9bcb9639f338 R15:
207904daf17a68cd
[  +0.000019] FS:  00000000006d85d0(0000) GS:ffff9bdbbf840000(0000)
knlGS:0000000000000000
[  +0.000022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000017] CR2: 000000c000111000 CR3: 0000002023cf6002 CR4:
00000000003606e0
[  +0.000019] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  +0.000020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000020] Call Trace:
[  +0.000018]  ? i40e_ndo_set_vf_port_vlan+0x1c4/0x2c0 [i40e]
[  +0.000022]  ? do_setlink+0x577/0xe90
[  +0.000018]  ? security_sock_rcv_skb+0x2a/0x40
[  +0.000015]  ? sk_filter_trim_cap+0x4f/0x210
[  +0.000015]  ? netlink_attachskb+0x1bc/0x1d0
[  +0.000014]  ? rtnl_setlink+0xdd/0x140
[  +0.000015]  ? security_capset+0x50/0x60
[  +0.000013]  ? rtnetlink_rcv_msg+0x2b1/0x360
[  +0.000014]  ? rtnl_calcit.isra.32+0x110/0x110
[  +0.000014]  ? netlink_rcv_skb+0x49/0x110
[  +0.000013]  ? netlink_unicast+0x191/0x220
[  +0.000013]  ? netlink_sendmsg+0x204/0x3d0
[  +0.000015]  ? sock_sendmsg+0x4c/0x50
[  +0.000013]  ? __sys_sendto+0xee/0x160
[  +0.000013]  ? __sys_bind+0x79/0xf0
[  +0.000033]  ? __sys_socket+0x93/0xe0
[  +0.000015]  ? __x64_sys_sendto+0x24/0x30
[  +0.000017]  ? do_syscall_64+0x5f/0x1a0
[  +0.000015]  ? page_fault+0x8/0x30
[  +0.000013]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0.000017] Modules linked in: i40iw i40e(OE) veth vxlan ip6_udp_tunnel
udp_tunnel xt_statistic xt_nat xt_comment xt_mark nf_conntrack_netlink
xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE iavf tun bridge stp llc
ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4
xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vfio_pci
vfio_virqfd vfio_iommu_type1 vfio overlay ip_set uio nfnetlink
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi
sunrpc ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm
ib_umad rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support dcdbas intel_rapl
sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate
intel_uncore
[  +0.000033]  intel_rapl_perf joydev mxm_wmi lpc_ich ipmi_ssif ib_uverbs
ib_core mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter
pcc_cpufreq xfs libcrc32c mgag200 drm_kms_helper ttm virtio_net
net_failover drm crc32c_intel igb failover megaraid_sas dca i2c_algo_bit
wmi [last unloaded: i40e]
[  +0.005208] ---[ end trace deea73cdd7c0f936 ]---
[  +0.012672] RIP: 0010:i40e_config_vf_promiscuous_mode+0x17a/0x350 [i40e]
[  +0.000660] Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 48 83 c6 08 48 39 34
24 75 dd 85 c9 74 77 44 0f b6 64 24 08 45 31 d2 4d 8b 3e 4d 85 ff 74 57
<41> 0f b7 4f 16 66 81 f9 ff 0f 77 43 0f b7 b3 ea 0c 00 00 45 31 c0
[  +0.001287] RSP: 0018:ffffa814c90c78b0 EFLAGS: 00010202
[  +0.000778] RAX: 0000000000000000 RBX: ffff9bcb9639f000 RCX:
0000000000000000
[  +0.000688] RDX: 0000000000000000 RSI: 0000000006000000 RDI:
ffff9bcb83f30370
[  +0.000657] RBP: ffff9bcb83f30008 R08: 0000000000000000 R09:
0000000000000023
[  +0.000676] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[  +0.000695] R13: 0000000000000000 R14: ffff9bcb9639f338 R15:
207904daf17a68cd
[  +0.000669] FS:  00000000006d85d0(0000) GS:ffff9bdbbf840000(0000)
knlGS:0000000000000000
[  +0.000658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000669] CR2: 000000c000111000 CR3: 0000002023cf6002 CR4:
00000000003606e0
[  +0.000665] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  +0.000674] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  +0.000670] Kernel panic - not syncing: Fatal exception

Using:
  objdump -S i40e_virtchnl_pf.o > i40e_virtchnl_pf.txt

Crash is located in i40e_config_vf_promiscuous_mode() in i40e_virtchnl_pf.c:

 static i40e_status i40e_config_vf_promiscuous_mode(struct i40e_vf *vf,
  u16 vsi_id,
  bool allmulti,
  bool alluni)
{
:
if (vf->port_vlan_id) {
:
} else if (i40e_getnum_vf_vsi_vlan_filters(vsi)) {
     1b5: 85 c9                 test   %ecx,%ecx
     1b7: 74 77                 je     230
<i40e_config_vf_promiscuous_mode+0x1e0>
aq_ret = i40e_aq_set_vsi_mc_promisc_on_vlan(hw,
     1b9: 44 0f b6 64 24 08     movzbl 0x8(%rsp),%r12d
i40e_status aq_ret = I40E_SUCCESS;
     1bf: 45 31 d2             xor    %r10d,%r10d
hash_for_each(vsi->mac_filter_hash, bkt, f, hlist) {
     1c2: 4d 8b 3e             mov    (%r14),%r15
     1c5: 4d 85 ff             test   %r15,%r15
     1c8: 74 57                 je     221
<i40e_config_vf_promiscuous_mode+0x1d1>
if (f->vlan < 0 || f->vlan > I40E_MAX_VLANID)
-->  1ca: 41 0f b7 4f 16       movzwl 0x16(%r15),%ecx
     1cf: 66 81 f9 ff 0f       cmp    $0xfff,%cx
     1d4: 77 43                 ja     219
<i40e_config_vf_promiscuous_mode+0x1c9>
aq_ret = i40e_aq_set_vsi_mc_promisc_on_vlan(hw,
     1d6: 0f b7 b3 ea 0c 00 00 movzwl 0xcea(%rbx),%esi
     1dd: 45 31 c0             xor    %r8d,%r8d
     1e0: 44 89 e2             mov    %r12d,%edx
     1e3: 48 89 ef             mov    %rbp,%rdi
     1e6: e8 00 00 00 00       callq  1eb
<i40e_config_vf_promiscuous_mode+0x19b>
if (aq_ret) {
     1eb: 85 c0                 test   %eax,%eax
     1ed: 0f 85 87 00 00 00     jne    27a
<i40e_config_vf_promiscuous_mode+0x22a>

FYI - A put a lot of debug prints to try to narrow down, and if I print all
the bkts before hash_for_each(), the problem goes away. So looks like a
race condition or multiple threads accessing the same data. I removed the
debug and added a spin_lock_bh(&vsi->mac_filter_hash_lock);
and spin_unlock_bh(&vsi->mac_filter_hash_lock); and change the
hash_for_each() to a hash_for_each_safe() and the crash also went away.

Let me know what additional data you need.

Thanks,
Billy McFall

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet

Reply via email to