Hi Sarah,

I am adding the netdev mailing list as I am not certain this is an
i350 specific issue. The traces themselves aren't anything I recognize
as an existing issue. From what I can tell it looks like you are
running Xen, so would I be correct in assuming you are bridging
between VMs? If so are you using any sort of tunnels on your network,
if so what type? This information would be useful as we may be looking
at a bug in a tunnel offload for GRO.

On Fri, Nov 17, 2017 at 3:28 PM, Sarah Newman <sarah.new...@computer.org> wrote:
> Hi,
>
> I have an X10 supermicro with two I350's that has crashed twice now under 
> v4.9.39 within the last 3 weeks, with no crashes before v4.9.39:

What was the last kernel you tested before v4.9.39? Just wondering as
it will help to rule out certain patches as possibly being the issue.

> $ /sbin/lspci | grep -i ethernet
> 02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network 
> Connection (rev 01)
> 02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network 
> Connection (rev 01)
> 04:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network 
> Connection (rev 01)
> 04:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network 
> Connection (rev 01)
>
> And some X9 supermicro's that have not crashed, with a single I350 I believe:
> $ /sbin/lspci | grep -i ethernet
> 06:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network 
> Connection (rev 01)
> 06:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network 
> Connection (rev 01)
> 06:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network 
> Connection (rev 01)
> 06:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network 
> Connection (rev 01)
>
> I see in the release notes 
> https://downloadmirror.intel.com/22919/eng/README.txt " Do Not Use LRO When 
> Routing Packets."
>
> We are bridging traffic, not routing, and the crashes are in the GRO code.
>
> Is it possible there are problems with GRO for bridging in the igb driver 
> now? If I disable GRO can I have some confidence it will fix the issue?

As far as LRO not being used when routing, just so you know LRO and
GRO are two very different things. One of the issues with LRO is that
it wasn't reversible in some cases and so could lead to the packet
being changed if they were rerouted. With GRO that shouldn't be the
case as we should be able to get back out the original packets that
were put into a frame. So there shouldn't be any issues using GRO with
bridging or routing.

GRO isn't in the driver. It is in the network stack of the kernel
itself. The only responsibility of igb is to provide the frames in the
correct format so that they can be assembled by GRO if it is enabled.

> Here are my offload settings:
> Features for eth0:
> rx-checksumming: on
> tx-checksumming: on
>         tx-checksum-ipv4: off [fixed]
>         tx-checksum-ip-generic: on
>         tx-checksum-ipv6: off [fixed]
>         tx-checksum-fcoe-crc: off [fixed]
>         tx-checksum-sctp: on
> scatter-gather: on
>         tx-scatter-gather: on
>         tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
>         tx-tcp-segmentation: on
>         tx-tcp-ecn-segmentation: off [fixed]
>         tx-tcp-mangleid-segmentation: off
>         tx-tcp6-segmentation: on
> udp-fragmentation-offload: off [fixed]
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off [fixed]
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off
> receive-hashing: on
> highdma: on [fixed]
> rx-vlan-filter: on [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: off [fixed]
> netns-local: off [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: on
> tx-gre-csum-segmentation: on
> tx-ipxip4-segmentation: on
> tx-ipxip6-segmentation: on
> tx-udp_tnl-segmentation: on
> tx-udp_tnl-csum-segmentation: on
> tx-gso-partial: on
> tx-sctp-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> busy-poll: off [fixed]
> hw-tc-offload: off [fixed]
>
> First crash:
>
> [4083386.299221] ------------[ cut here ]------------
> [4083386.299358] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:1473 
> inet_gro_complete+0xbb/0xd0
> [4083386.299520] Modules linked in: sb_edac edac_core 8021q mrp garp 
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev 
> ip6table_filter
> ip6_tables xen_pciback blktap xen_netback xen_gntdev xen_gnt
> alloc xenfs xen_privcmd xen_evtchn xen_blkback tun sch_htb fuse ext2 ebt_mark 
> ebt_ip ebt_arp ebtable_filter ebtables drbd lru_cache cls_fw
> br_netfilter bridge stp llc iTCO_wdt iTCO_vendor_support pcspkr raid456 
> async_raid6_recov async_pq
>  async_xor xor async_memcpy async_tx raid10 raid6_pq libcrc32c joydev shpchp 
> i2c_i801 i2c_smbus mei_me mei lpc_ich fjes ipmi_si ipmi_msghandler
> acpi_power_meter ioatdma igb dca raid1 mlx4_en mlx4_ib ib_core ptp pps_core 
> mlx4_core mpt3sas
>  scsi_transport_sas raid_class wmi ast ttm
> [4083386.300888] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.39 #1
> [4083386.301002] Hardware name: Supermicro Super Server/X10DRi-LN4+, BIOS 
> 2.0a 09/16/2016
> [4083386.301109]  ffff880306603d90 ffffffff813f5935 0000000000000000 
> 0000000000000000
> [4083386.301221]  ffff880306603dd0 ffffffff810a7e01 000005c18174578a 
> ffff8802f94a9a00
> [4083386.301333]  ffff8802f0824450 0000000000000000 0000000000000040 
> 0000000000000040
> [4083386.301445] Call Trace:
> [4083386.301483]  <IRQ> [4083386.301519]   dump_stack+0x63/0x8e
> [4083386.301596]   __warn+0xd1/0xf0
> [4083386.301665]   warn_slowpath_null+0x1d/0x20
> [4083386.301747]   inet_gro_complete+0xbb/0xd0
> [4083386.301830]   napi_gro_complete+0x73/0xa0
> [4083386.301911]   napi_gro_flush+0x5f/0x80
> [4083386.301988]   napi_complete_done+0x6a/0xb0
> [4083386.302075]   igb_poll+0x38d/0x720 [igb]
> [4083386.302156]   ? igb_msix_ring+0x2e/0x40 [igb]
> [4083386.302255]   ? __handle_irq_event_percpu+0x4b/0x1a0
> [4083386.302349]   net_rx_action+0x158/0x360
> [4083386.302430]   __do_softirq+0xd1/0x283
> [4083386.302507]   irq_exit+0xe9/0x100
> [4083386.302580]   xen_evtchn_do_upcall+0x35/0x50
> [4083386.302665]   xen_do_hypervisor_callback+0x1e/0x40
> [4083386.302754]  <EOI> [4083386.302787]   ? xen_hypercall_sched_op+0xa/0x20
> [4083386.302876]   ? xen_hypercall_sched_op+0xa/0x20
> [4083386.302965]   ? xen_safe_halt+0x10/0x20
> [4083386.303043]   ? default_idle+0x1e/0xd0
> [4083386.303122]   ? arch_cpu_idle+0xf/0x20
> [4083386.303200]   ? default_idle_call+0x2c/0x40
> [4083386.303284]   ? cpu_startup_entry+0x1ac/0x240
> [4083386.303370]   ? rest_init+0x77/0x80
> [4083386.303462]   ? start_kernel+0x4a7/0x4b4
> [4083386.303568]   ? set_init_arg+0x55/0x55
> [4083386.303670]   ? x86_64_start_reservations+0x24/0x26
> [4083386.303776]   ? xen_start_kernel+0x555/0x561
> [4083386.303873] ---[ end trace 8294f59ced689507 ]---
> [4083386.303958] general protection fault: 0000 [#1] SMP
> [4083386.304041] Modules linked in: sb_edac edac_core 8021q mrp garp 
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev 
> ip6table_filter
> ip6_tables xen_pciback blktap xen_netback xen_gntdev xen_gntalloc xenfs 
> xen_privcmd xe
> n_evtchn xen_blkback tun sch_htb fuse ext2 ebt_mark ebt_ip ebt_arp 
> ebtable_filter ebtables drbd lru_cache cls_fw br_netfilter bridge stp llc 
> iTCO_wdt
> iTCO_vendor_support pcspkr raid456 async_raid6_recov async_pq async_xor xor 
> async_memcp
> y async_tx raid10 raid6_pq libcrc32c joydev shpchp i2c_i801 i2c_smbus mei_me 
> mei lpc_ich fjes ipmi_si ipmi_msghandler acpi_power_meter ioatdma igb dca
> raid1 mlx4_en mlx4_ib ib_core ptp pps_core mlx4_core mpt3sas 
> scsi_transport_sas raid_c
> lass wmi ast ttm
> [4083386.305179] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       
> 4.9.39 #1
> [4083386.305307] Hardware name: Supermicro Super Server/X10DRi-LN4+, BIOS 
> 2.0a 09/16/2016
> [4083386.305414] task: ffffffff81e0e540 task.stack: ffffffff81e00000
> [4083386.305498] RIP: e030:   skb_release_data+0x73/0xf0
> [4083386.305617] RSP: e02b:ffff880306603d90  EFLAGS: 00010206
> [4083386.305692] RAX: 0000000000000030 RBX: f5b36db76bd162c7 RCX: 
> ffffffff81e60048
> [4083386.305790] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
> ffff8802f94a9a00
> [4083386.305887] RBP: ffff880306603db0 R08: 0000000000004277 R09: 
> 0000000000000000
> [4083386.305985] R10: 0000000000000005 R11: 0000000000000002 R12: 
> 0000000000000000
> [4083386.306083] R13: ffff8802f94a9a00 R14: ffff88032f527740 R15: 
> 0000000000000040
> [4083386.306186] FS:  0000000000000000(0000) GS:ffff880306600000(0000) 
> knlGS:0000000000000000
> [4083386.306296] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [4083386.306407] CR2: 0000000001692ed8 CR3: 000000022b3c9000 CR4: 
> 0000000000042660
> [4083386.306505] Stack:
> [4083386.306537]  ffff8802f94a9a00 ffff8802f94a9a00 ffffffff8175ac3e 
> 0000000000000040
> [4083386.306649]  ffff880306603dc8 ffffffff81745764 ffff8802f94a9a00 
> ffff880306603df0
> [4083386.306762]  ffffffff817457c2 ffff8802f94a9a00 ffff8802f0824450 
> 0000000000000000
> [4083386.306874] Call Trace:
> [4083386.306911]  <IRQ> [4083386.306944]   ? napi_gro_complete+0x5e/0xa0
> [4083386.307038]   skb_release_all+0x24/0x30
> [4083386.307133]   kfree_skb+0x32/0x90
> [4083386.307206]   napi_gro_complete+0x5e/0xa0
> [4083386.307287]   napi_gro_flush+0x5f/0x80
> [4083386.307365]   napi_complete_done+0x6a/0xb0
> [4083386.307449]   igb_poll+0x38d/0x720 [igb]
> [4083386.307530]   ? igb_msix_ring+0x2e/0x40 [igb]
> [4083386.307617]   ? __handle_irq_event_percpu+0x4b/0x1a0
> [4083386.307720]   net_rx_action+0x158/0x360
> [4083386.307800]   __do_softirq+0xd1/0x283
> [4083386.307877]   irq_exit+0xe9/0x100
> [4083386.307949]   xen_evtchn_do_upcall+0x35/0x50
> [4083386.308034]   xen_do_hypervisor_callback+0x1e/0x40
> [4083386.308124]  <EOI> [4083386.308156]   ? xen_hypercall_sched_op+0xa/0x20
> [4083386.308246]   ? xen_hypercall_sched_op+0xa/0x20
> [4083386.308334]   ? xen_safe_halt+0x10/0x20
> [4083386.308413]   ? default_idle+0x1e/0xd0
> [4083386.308491]   ? arch_cpu_idle+0xf/0x20
> [4083386.308568]   ? default_idle_call+0x2c/0x40
> [4083386.308651]   ? cpu_startup_entry+0x1ac/0x240
> [4083386.308737]   ? rest_init+0x77/0x80
> [4083386.308811]   ? start_kernel+0x4a7/0x4b4
> [4083386.308890]   ? set_init_arg+0x55/0x55
> [4083386.308968]   ? x86_64_start_reservations+0x24/0x26
> [4083386.309060]   ? xen_start_kernel+0x555/0x561
> [4083386.309144] Code: f0 41 0f c1 46 20 39 c2 74 09 5b 41 5c 41 5d 41 5e 5d 
> c3 45 31 e4 41 80 3e 00 74 39 49 63 c4 48 83 c0 03 48 c1 e0 04 49 8b 1c
> 06 <48> 8b 43 20 a8 01 75 6f f0 ff 4b 1c 74 55 48 8b 03 48 c1 e8 33
> [4083386.309571] RIP   skb_release_data+0x73/0xf0
> [4083386.309658]  RSP <ffff880306603d90>
> [4083386.313000] ---[ end trace 8294f59ced689508 ]---
> [4083386.389667] Kernel panic - not syncing: Fatal exception in interrupt
> [4083386.389791] Kernel Offset: disabled
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
>
> Second crash:
>
> [1838269.012349] general protection fault: 0000 [#1] SMP
> [1838269.012452] Modules linked in: ebtable_nat sb_edac edac_core 8021q mrp 
> garp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev
> ip6table_filter ip6_tables xen_pciback blktap xen_netback xen_gntdev 
> xen_gntalloc xenfs xe
> n_privcmd xen_evtchn xen_blkback tun sch_htb fuse ext2 ebt_mark ebt_ip 
> ebt_arp ebtable_filter ebtables drbd lru_cache cls_fw br_netfilter bridge stp
> llc iTCO_wdt iTCO_vendor_support pcspkr raid456 async_raid6_recov async_pq 
> async_xor xor
>  async_memcpy async_tx raid10 raid6_pq libcrc32c joydev i2c_i801 i2c_smbus 
> lpc_ich shpchp mei_me mei fjes ipmi_si ipmi_msghandler acpi_power_meter
> ioatdma igb dca raid1 mlx4_en mlx4_ib ib_core ptp pps_core mlx4_core mpt3sas 
> scsi_transpor
> t_sas raid_class wmi ast ttm
> [1838269.013521] CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.9.39 #1
> [1838269.013637] Hardware name: Supermicro Super Server/X10DRi-LN4+, BIOS 
> 2.0a 09/16/2016
> [1838269.013743] task: ffff88030008c4c0 task.stack: ffffc90041978000
> [1838269.013826] RIP: e030:   memcpy_erms+0x6/0x10
> [1838269.013952] RSP: e02b:ffffc9004197bac0  EFLAGS: 00010202
> [1838269.014026] RAX: ffff88032fcafe16 RBX: 0000000000000004 RCX: 
> 0000000000000004
> [1838269.014124] RDX: 0000000000000004 RSI: 62a16ddedc6dbcb3 RDI: 
> ffff88032fcafe16
> [1838269.014222] RBP: ffffc9004197bb20 R08: 0000000000000004 R09: 
> 0000000000000004
> [1838269.014320] R10: ffff88026ae89500 R11: 0000000044639632 R12: 
> 0000000000000048
> [1838269.014417] R13: 0000000000000000 R14: 0000000044639632 R15: 
> 0000000000000048
> [1838269.014519] FS:  0000000000000000(0000) GS:ffff880306640000(0000) 
> knlGS:ffff880306640000
> [1838269.014629] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1838269.014709] CR2: ffffffffff600400 CR3: 0000000051939000 CR4: 
> 0000000000042660
> [1838269.014808] Stack:
> [1838269.014840]  ffffffff81744c17 ffff88026ae89500 0000000044639632 
> ffff88030008c4c0
> [1838269.014952]  ffffffff00000004 0000000000000004 ffff88032fcafe16 
> ffff88026ae89500
> [1838269.015064]  0000000000000004 0000000000000004 000000000000004c 
> 0000000000000028
> [1838269.015176] Call Trace:
> [1838269.015217]   ? skb_copy_bits+0x137/0x2c0
> [1838269.015299]   __pskb_pull_tail+0x7f/0x3b0
> [1838269.015382]   tcp_gro_receive+0x2c5/0x300
> [1838269.015465]   tcp6_gro_receive+0x13a/0x1a0
> [1838269.015547]   ipv6_gro_receive+0x1c6/0x380
> [1838269.015630]   dev_gro_receive+0x269/0x3b0
> [1838269.015712]   napi_gro_receive+0x38/0xf0
> [1838269.015796]   igb_clean_rx_irq+0x38e/0x690 [igb]
> [1838269.015886]   igb_poll+0x362/0x720 [igb]
> [1838269.015968]   ? dequeue_entity+0x26e/0xa90
> [1838269.016051]   ? xen_mc_flush+0x17b/0x1b0
> [1838269.016131]   net_rx_action+0x158/0x360
> [1838269.016212]   __do_softirq+0xd1/0x283
> [1838269.016290]   ? sort_range+0x30/0x30
> [1838269.016366]   run_ksoftirqd+0x29/0x50
> [1838269.016443]   smpboot_thread_fn+0x110/0x160
> [1838269.016525]   kthread+0xd7/0xf0
> [1838269.016595]   ? kthread_park+0x60/0x60
> [1838269.016673]   ret_from_fork+0x25/0x30
> [1838269.016758] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 
> e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89
> d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
> [1838269.017183] RIP   memcpy_erms+0x6/0x10
> [1838269.017264]  RSP <ffffc9004197bac0>
> [1838269.020618] ---[ end trace 3506ce1d7200529a ]---
> [1838269.079891] Kernel panic - not syncing: Fatal exception in interrupt
> [1838269.080014] Kernel Offset: disabled
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
>
> Thanks, Sarah
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> E1000-devel mailing list
> e1000-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit 
> http://communities.intel.com/community/wired

Reply via email to