Hi,

I have an X10 supermicro with two I350's that has crashed twice now under 
v4.9.39 within the last 3 weeks, with no crashes before v4.9.39:

$ /sbin/lspci | grep -i ethernet
02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)
02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)
04:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)
04:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)

And some X9 supermicro's that have not crashed, with a single I350 I believe:
$ /sbin/lspci | grep -i ethernet
06:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)
06:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)
06:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)
06:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)

I see in the release notes 
https://downloadmirror.intel.com/22919/eng/README.txt " Do Not Use LRO When 
Routing Packets."

We are bridging traffic, not routing, and the crashes are in the GRO code.

Is it possible there are problems with GRO for bridging in the igb driver now? 
If I disable GRO can I have some confidence it will fix the issue?

Here are my offload settings:
Features for eth0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
hw-tc-offload: off [fixed]

First crash:

[4083386.299221] ------------[ cut here ]------------
[4083386.299358] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:1473 
inet_gro_complete+0xbb/0xd0
[4083386.299520] Modules linked in: sb_edac edac_core 8021q mrp garp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev 
ip6table_filter
ip6_tables xen_pciback blktap xen_netback xen_gntdev xen_gnt
alloc xenfs xen_privcmd xen_evtchn xen_blkback tun sch_htb fuse ext2 ebt_mark 
ebt_ip ebt_arp ebtable_filter ebtables drbd lru_cache cls_fw
br_netfilter bridge stp llc iTCO_wdt iTCO_vendor_support pcspkr raid456 
async_raid6_recov async_pq
 async_xor xor async_memcpy async_tx raid10 raid6_pq libcrc32c joydev shpchp 
i2c_i801 i2c_smbus mei_me mei lpc_ich fjes ipmi_si ipmi_msghandler
acpi_power_meter ioatdma igb dca raid1 mlx4_en mlx4_ib ib_core ptp pps_core 
mlx4_core mpt3sas
 scsi_transport_sas raid_class wmi ast ttm
[4083386.300888] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.39 #1
[4083386.301002] Hardware name: Supermicro Super Server/X10DRi-LN4+, BIOS 2.0a 
09/16/2016
[4083386.301109]  ffff880306603d90 ffffffff813f5935 0000000000000000 
0000000000000000
[4083386.301221]  ffff880306603dd0 ffffffff810a7e01 000005c18174578a 
ffff8802f94a9a00
[4083386.301333]  ffff8802f0824450 0000000000000000 0000000000000040 
0000000000000040
[4083386.301445] Call Trace:
[4083386.301483]  <IRQ> [4083386.301519]   dump_stack+0x63/0x8e
[4083386.301596]   __warn+0xd1/0xf0
[4083386.301665]   warn_slowpath_null+0x1d/0x20
[4083386.301747]   inet_gro_complete+0xbb/0xd0
[4083386.301830]   napi_gro_complete+0x73/0xa0
[4083386.301911]   napi_gro_flush+0x5f/0x80
[4083386.301988]   napi_complete_done+0x6a/0xb0
[4083386.302075]   igb_poll+0x38d/0x720 [igb]
[4083386.302156]   ? igb_msix_ring+0x2e/0x40 [igb]
[4083386.302255]   ? __handle_irq_event_percpu+0x4b/0x1a0
[4083386.302349]   net_rx_action+0x158/0x360
[4083386.302430]   __do_softirq+0xd1/0x283
[4083386.302507]   irq_exit+0xe9/0x100
[4083386.302580]   xen_evtchn_do_upcall+0x35/0x50
[4083386.302665]   xen_do_hypervisor_callback+0x1e/0x40
[4083386.302754]  <EOI> [4083386.302787]   ? xen_hypercall_sched_op+0xa/0x20
[4083386.302876]   ? xen_hypercall_sched_op+0xa/0x20
[4083386.302965]   ? xen_safe_halt+0x10/0x20
[4083386.303043]   ? default_idle+0x1e/0xd0
[4083386.303122]   ? arch_cpu_idle+0xf/0x20
[4083386.303200]   ? default_idle_call+0x2c/0x40
[4083386.303284]   ? cpu_startup_entry+0x1ac/0x240
[4083386.303370]   ? rest_init+0x77/0x80
[4083386.303462]   ? start_kernel+0x4a7/0x4b4
[4083386.303568]   ? set_init_arg+0x55/0x55
[4083386.303670]   ? x86_64_start_reservations+0x24/0x26
[4083386.303776]   ? xen_start_kernel+0x555/0x561
[4083386.303873] ---[ end trace 8294f59ced689507 ]---
[4083386.303958] general protection fault: 0000 [#1] SMP
[4083386.304041] Modules linked in: sb_edac edac_core 8021q mrp garp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev 
ip6table_filter
ip6_tables xen_pciback blktap xen_netback xen_gntdev xen_gntalloc xenfs 
xen_privcmd xe
n_evtchn xen_blkback tun sch_htb fuse ext2 ebt_mark ebt_ip ebt_arp 
ebtable_filter ebtables drbd lru_cache cls_fw br_netfilter bridge stp llc 
iTCO_wdt
iTCO_vendor_support pcspkr raid456 async_raid6_recov async_pq async_xor xor 
async_memcp
y async_tx raid10 raid6_pq libcrc32c joydev shpchp i2c_i801 i2c_smbus mei_me 
mei lpc_ich fjes ipmi_si ipmi_msghandler acpi_power_meter ioatdma igb dca
raid1 mlx4_en mlx4_ib ib_core ptp pps_core mlx4_core mpt3sas scsi_transport_sas 
raid_c
lass wmi ast ttm
[4083386.305179] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       4.9.39 
#1
[4083386.305307] Hardware name: Supermicro Super Server/X10DRi-LN4+, BIOS 2.0a 
09/16/2016
[4083386.305414] task: ffffffff81e0e540 task.stack: ffffffff81e00000
[4083386.305498] RIP: e030:   skb_release_data+0x73/0xf0
[4083386.305617] RSP: e02b:ffff880306603d90  EFLAGS: 00010206
[4083386.305692] RAX: 0000000000000030 RBX: f5b36db76bd162c7 RCX: 
ffffffff81e60048
[4083386.305790] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
ffff8802f94a9a00
[4083386.305887] RBP: ffff880306603db0 R08: 0000000000004277 R09: 
0000000000000000
[4083386.305985] R10: 0000000000000005 R11: 0000000000000002 R12: 
0000000000000000
[4083386.306083] R13: ffff8802f94a9a00 R14: ffff88032f527740 R15: 
0000000000000040
[4083386.306186] FS:  0000000000000000(0000) GS:ffff880306600000(0000) 
knlGS:0000000000000000
[4083386.306296] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[4083386.306407] CR2: 0000000001692ed8 CR3: 000000022b3c9000 CR4: 
0000000000042660
[4083386.306505] Stack:
[4083386.306537]  ffff8802f94a9a00 ffff8802f94a9a00 ffffffff8175ac3e 
0000000000000040
[4083386.306649]  ffff880306603dc8 ffffffff81745764 ffff8802f94a9a00 
ffff880306603df0
[4083386.306762]  ffffffff817457c2 ffff8802f94a9a00 ffff8802f0824450 
0000000000000000
[4083386.306874] Call Trace:
[4083386.306911]  <IRQ> [4083386.306944]   ? napi_gro_complete+0x5e/0xa0
[4083386.307038]   skb_release_all+0x24/0x30
[4083386.307133]   kfree_skb+0x32/0x90
[4083386.307206]   napi_gro_complete+0x5e/0xa0
[4083386.307287]   napi_gro_flush+0x5f/0x80
[4083386.307365]   napi_complete_done+0x6a/0xb0
[4083386.307449]   igb_poll+0x38d/0x720 [igb]
[4083386.307530]   ? igb_msix_ring+0x2e/0x40 [igb]
[4083386.307617]   ? __handle_irq_event_percpu+0x4b/0x1a0
[4083386.307720]   net_rx_action+0x158/0x360
[4083386.307800]   __do_softirq+0xd1/0x283
[4083386.307877]   irq_exit+0xe9/0x100
[4083386.307949]   xen_evtchn_do_upcall+0x35/0x50
[4083386.308034]   xen_do_hypervisor_callback+0x1e/0x40
[4083386.308124]  <EOI> [4083386.308156]   ? xen_hypercall_sched_op+0xa/0x20
[4083386.308246]   ? xen_hypercall_sched_op+0xa/0x20
[4083386.308334]   ? xen_safe_halt+0x10/0x20
[4083386.308413]   ? default_idle+0x1e/0xd0
[4083386.308491]   ? arch_cpu_idle+0xf/0x20
[4083386.308568]   ? default_idle_call+0x2c/0x40
[4083386.308651]   ? cpu_startup_entry+0x1ac/0x240
[4083386.308737]   ? rest_init+0x77/0x80
[4083386.308811]   ? start_kernel+0x4a7/0x4b4
[4083386.308890]   ? set_init_arg+0x55/0x55
[4083386.308968]   ? x86_64_start_reservations+0x24/0x26
[4083386.309060]   ? xen_start_kernel+0x555/0x561
[4083386.309144] Code: f0 41 0f c1 46 20 39 c2 74 09 5b 41 5c 41 5d 41 5e 5d c3 
45 31 e4 41 80 3e 00 74 39 49 63 c4 48 83 c0 03 48 c1 e0 04 49 8b 1c
06 <48> 8b 43 20 a8 01 75 6f f0 ff 4b 1c 74 55 48 8b 03 48 c1 e8 33
[4083386.309571] RIP   skb_release_data+0x73/0xf0
[4083386.309658]  RSP <ffff880306603d90>
[4083386.313000] ---[ end trace 8294f59ced689508 ]---
[4083386.389667] Kernel panic - not syncing: Fatal exception in interrupt
[4083386.389791] Kernel Offset: disabled
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.

Second crash:

[1838269.012349] general protection fault: 0000 [#1] SMP
[1838269.012452] Modules linked in: ebtable_nat sb_edac edac_core 8021q mrp 
garp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev
ip6table_filter ip6_tables xen_pciback blktap xen_netback xen_gntdev 
xen_gntalloc xenfs xe
n_privcmd xen_evtchn xen_blkback tun sch_htb fuse ext2 ebt_mark ebt_ip ebt_arp 
ebtable_filter ebtables drbd lru_cache cls_fw br_netfilter bridge stp
llc iTCO_wdt iTCO_vendor_support pcspkr raid456 async_raid6_recov async_pq 
async_xor xor
 async_memcpy async_tx raid10 raid6_pq libcrc32c joydev i2c_i801 i2c_smbus 
lpc_ich shpchp mei_me mei fjes ipmi_si ipmi_msghandler acpi_power_meter
ioatdma igb dca raid1 mlx4_en mlx4_ib ib_core ptp pps_core mlx4_core mpt3sas 
scsi_transpor
t_sas raid_class wmi ast ttm
[1838269.013521] CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.9.39 #1
[1838269.013637] Hardware name: Supermicro Super Server/X10DRi-LN4+, BIOS 2.0a 
09/16/2016
[1838269.013743] task: ffff88030008c4c0 task.stack: ffffc90041978000
[1838269.013826] RIP: e030:   memcpy_erms+0x6/0x10
[1838269.013952] RSP: e02b:ffffc9004197bac0  EFLAGS: 00010202
[1838269.014026] RAX: ffff88032fcafe16 RBX: 0000000000000004 RCX: 
0000000000000004
[1838269.014124] RDX: 0000000000000004 RSI: 62a16ddedc6dbcb3 RDI: 
ffff88032fcafe16
[1838269.014222] RBP: ffffc9004197bb20 R08: 0000000000000004 R09: 
0000000000000004
[1838269.014320] R10: ffff88026ae89500 R11: 0000000044639632 R12: 
0000000000000048
[1838269.014417] R13: 0000000000000000 R14: 0000000044639632 R15: 
0000000000000048
[1838269.014519] FS:  0000000000000000(0000) GS:ffff880306640000(0000) 
knlGS:ffff880306640000
[1838269.014629] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[1838269.014709] CR2: ffffffffff600400 CR3: 0000000051939000 CR4: 
0000000000042660
[1838269.014808] Stack:
[1838269.014840]  ffffffff81744c17 ffff88026ae89500 0000000044639632 
ffff88030008c4c0
[1838269.014952]  ffffffff00000004 0000000000000004 ffff88032fcafe16 
ffff88026ae89500
[1838269.015064]  0000000000000004 0000000000000004 000000000000004c 
0000000000000028
[1838269.015176] Call Trace:
[1838269.015217]   ? skb_copy_bits+0x137/0x2c0
[1838269.015299]   __pskb_pull_tail+0x7f/0x3b0
[1838269.015382]   tcp_gro_receive+0x2c5/0x300
[1838269.015465]   tcp6_gro_receive+0x13a/0x1a0
[1838269.015547]   ipv6_gro_receive+0x1c6/0x380
[1838269.015630]   dev_gro_receive+0x269/0x3b0
[1838269.015712]   napi_gro_receive+0x38/0xf0
[1838269.015796]   igb_clean_rx_irq+0x38e/0x690 [igb]
[1838269.015886]   igb_poll+0x362/0x720 [igb]
[1838269.015968]   ? dequeue_entity+0x26e/0xa90
[1838269.016051]   ? xen_mc_flush+0x17b/0x1b0
[1838269.016131]   net_rx_action+0x158/0x360
[1838269.016212]   __do_softirq+0xd1/0x283
[1838269.016290]   ? sort_range+0x30/0x30
[1838269.016366]   run_ksoftirqd+0x29/0x50
[1838269.016443]   smpboot_thread_fn+0x110/0x160
[1838269.016525]   kthread+0xd7/0xf0
[1838269.016595]   ? kthread_park+0x60/0x60
[1838269.016673]   ret_from_fork+0x25/0x30
[1838269.016758] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 
03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89
d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
[1838269.017183] RIP   memcpy_erms+0x6/0x10
[1838269.017264]  RSP <ffffc9004197bac0>
[1838269.020618] ---[ end trace 3506ce1d7200529a ]---
[1838269.079891] Kernel panic - not syncing: Fatal exception in interrupt
[1838269.080014] Kernel Offset: disabled
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.

Thanks, Sarah

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to