Package: src:linux Version: 4.9.88-1 Hi,
I'm observing the attached errors on machines that are Xen dom0 and running the latest Debian Stretch 4.9 kernel as dom0 kernel. The errors have been happening a few times in the last few weeks. It started after upgrading them from Jessie and 3.16 kernel to Stretch with 4.9 kernel. For networking between domUs and the outside world, we use openvswitch. After such an error happens: * The amount of "flows" in the kernel quickly raises to the limit, 10000, as seen in output of ovs-dpctl show. * Network traffic that should flow through the openvswitch bridge starts disappearing in a seemingly random way. * The memory usage of the userspace ovs-vswitchd starts growing quickly. * Many of the ovs commands, like to add or remove an interface or bridge hang. After a restart of the openvswitch-switch service, and fixing up a bunch of configuration of connected interfaces, functionality is restored. While most of the symptoms seem related to userspace openvswitch processes, the cause of it all seems to be in the kernel, while the userspace ovs-vswitchd process is receiving a network packet? Sadly I do not know how to reproduce this, except for just waiting until it happens again. Please advice what else I could use to help resolving this issue. Thanks, Regards, -- Hans van Kranenburg
May 4 08:23:03 altair kernel: [83978.662075] BUG: unable to handle kernel paging request at 000000030000001f May 4 08:23:03 altair kernel: [83978.665887] IP: [<ffffffff814f5c7d>] skb_release_data+0x8d/0x110 May 4 08:23:03 altair kernel: [83978.669837] PGD 0 May 4 08:23:03 altair kernel: [83978.669882] May 4 08:23:03 altair kernel: [83978.673589] Oops: 0000 [#1] SMP May 4 08:23:03 altair kernel: [83978.677281] Modules linked in: cls_u32 sch_ingress act_mirred sch_fq_codel ifb xt_mark sch_htb xt_physdev br_netfilter bridge stp llc xen_netback xen_blkback algif_skcipher af_alg dm_service_time binfmt_misc xen_gntdev xen_evtchn openvswitch nf_nat_ipv6 libcrc32c xenfs xen_privcmd ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_owner xt_multiport xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw dm_crypt intel_powerclamp crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel pcspkr serio_raw joydev evdev amdkfd radeon ttm drm_kms_helper drm i2c_algo_bit lpc_ich mfd_core i7core_edac hpilo May 4 08:23:03 altair kernel: [83978.701936] sg ipmi_si hpwdt edac_core ipmi_msghandler acpi_power_meter button shpchp dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache btrfs crc32c_generic xor raid6_pq mlx4_en ptp pps_core hid_generic usbhid hid sd_mod crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse ehci_pci uhci_hcd ehci_hcd usbcore usb_common hpsa scsi_transport_sas bnx2 mlx4_core devlink scsi_mod thermal May 4 08:23:03 altair kernel: [83978.724406] CPU: 1 PID: 1486 Comm: revalidator7 Not tainted 4.9.0-6-amd64 #1 Debian 4.9.88-1 May 4 08:23:03 altair kernel: [83978.729139] Hardware name: HP ProLiant DL360 G7, BIOS P68 08/16/2015 May 4 08:23:03 altair kernel: [83978.733958] task: ffff880119e1ee80 task.stack: ffffc90042764000 May 4 08:23:03 altair kernel: [83978.738724] RIP: e030:[<ffffffff814f5c7d>] [<ffffffff814f5c7d>] skb_release_data+0x8d/0x110 May 4 08:23:03 altair kernel: [83978.743560] RSP: e02b:ffffc90042767c78 EFLAGS: 00010206 May 4 08:23:03 altair kernel: [83978.748352] RAX: 0000000000000050 RBX: 00000002ffffffff RCX: ffffffff81ce0f40 May 4 08:23:03 altair kernel: [83978.753116] RDX: ffffffffffffffff RSI: ffff8800cc998900 RDI: ffff8800cc998900 May 4 08:23:03 altair kernel: [83978.757867] RBP: ffff8800cc998900 R08: ffff880123c00000 R09: ffff88011f220000 May 4 08:23:03 altair kernel: [83978.762598] R10: ffff8800cc998900 R11: ffff880119e10280 R12: 0000000000000002 May 4 08:23:03 altair kernel: [83978.767321] R13: ffff88011f227ec0 R14: ffff88011dea2800 R15: 0000000000000000 May 4 08:23:03 altair kernel: [83978.772000] FS: 00007fc1656cc700(0000) GS:ffff880128240000(0000) knlGS:0000000000000000 May 4 08:23:03 altair kernel: [83978.776671] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 May 4 08:23:03 altair kernel: [83978.781355] CR2: 000000030000001f CR3: 00000001212b1000 CR4: 0000000000002660 May 4 08:23:03 altair kernel: [83978.786135] Stack: May 4 08:23:03 altair kernel: [83978.790841] ffff880120a28000 ffff8800cc998900 ffffc90042767ec0 0000000000007ea4 May 4 08:23:03 altair kernel: [83978.795898] ffffffff814f6267 ffff880120a28000 ffff8800cc998900 ffffffff814fcc91 May 4 08:23:03 altair kernel: [83978.800806] ffff880120a28000 ffffffff8153f2df ffffc90000000000 ffff8800cc998900 May 4 08:23:03 altair kernel: [83978.805723] Call Trace: May 4 08:23:03 altair kernel: [83978.810654] [<ffffffff814f6267>] ? consume_skb+0x27/0x80 May 4 08:23:03 altair kernel: [83978.815626] [<ffffffff814fcc91>] ? skb_free_datagram+0x11/0x40 May 4 08:23:03 altair kernel: [83978.820545] [<ffffffff8153f2df>] ? netlink_recvmsg+0x19f/0x440 May 4 08:23:03 altair kernel: [83978.825426] [<ffffffff814ed4ca>] ? ___sys_recvmsg+0xda/0x1f0 May 4 08:23:03 altair kernel: [83978.830273] [<ffffffff812237fb>] ? file_update_time+0xcb/0x110 May 4 08:23:03 altair kernel: [83978.835058] [<ffffffff8120fbeb>] ? pipe_write+0x29b/0x3e0 May 4 08:23:03 altair kernel: [83978.839800] [<ffffffff812066b0>] ? new_sync_write+0xe0/0x130 May 4 08:23:03 altair kernel: [83978.844502] [<ffffffff814edf4e>] ? __sys_recvmsg+0x4e/0x90 May 4 08:23:03 altair kernel: [83978.849161] [<ffffffff81003b7d>] ? do_syscall_64+0x8d/0xf0 May 4 08:23:03 altair kernel: [83978.853779] [<ffffffff8161244e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6 May 4 08:23:03 altair kernel: [83978.858397] Code: 03 48 c1 e8 37 83 e0 07 83 f8 04 74 49 41 0f b6 45 00 41 83 c4 01 44 39 e0 7e 51 49 63 c4 48 83 c0 03 48 c1 e0 04 49 8b 5c 05 00 <48> 8b 43 20 48 8d 50 ff a8 01 48 0f 45 da f0 ff 4b 1c 75 bf 48 May 4 08:23:03 altair kernel: [83978.868227] RIP [<ffffffff814f5c7d>] skb_release_data+0x8d/0x110 May 4 08:23:03 altair kernel: [83978.873017] RSP <ffffc90042767c78> May 4 08:23:03 altair kernel: [83978.877746 May 4 22:00:22 sirius kernel: [1999361.378086] BUG: unable to handle kernel NULL pointer dereference at 00000000000001e0 May 4 22:00:22 sirius kernel: [1999361.381804] IP: [<ffffffff814f4c7d>] skb_release_data+0x8d/0x110 May 4 22:00:22 sirius kernel: [1999361.385492] PGD 0 May 4 22:00:22 sirius kernel: [1999361.385535] May 4 22:00:22 sirius kernel: [1999361.389145] Oops: 0000 [#1] SMP May 4 22:00:22 sirius kernel: [1999361.392725] Modules linked in: xt_physdev br_netfilter bridge stp llc xen_netback xen_blkback algif_skcipher af_alg dm_service_time binfmt_misc openvswitch nf_nat_ipv6 libcrc32c xen_gntdev xen_evtchn xenfs xen_privcmd ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_owner xt_multiport xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw dm_crypt intel_powerclamp crct10dif_pclmul crc32_pclmul amdkfd iTCO_wdt evdev joydev iTCO_vendor_support ghash_clmulni_intel radeon ttm serio_raw pcspkr drm_kms_helper drm i2c_algo_bit sg i7core_edac lpc_ich ipmi_si acpi_power_meter hpilo hpwdt mfd_core edac_core ipmi_msghandler button May 4 22:00:22 sirius kernel: [1999361.416634] shpchp dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache btrfs crc32c_generic xor raid6_pq mlx4_en ptp pps_core hid_generic usbhid hid sd_mod crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse ehci_pci uhci_hcd ehci_hcd usbcore usb_common mlx4_core hpsa scsi_transport_sas bnx2 devlink scsi_mod thermal May 4 22:00:22 sirius kernel: [1999361.438322] CPU: 2 PID: 1400 Comm: revalidator9 Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3 May 4 22:00:22 sirius kernel: [1999361.442773] Hardware name: HP ProLiant DL360 G7, BIOS P68 08/16/2015 May 4 22:00:22 sirius kernel: [1999361.447219] task: ffff880111c58540 task.stack: ffffc90041bcc000 May 4 22:00:22 sirius kernel: [1999361.451796] RIP: e030:[<ffffffff814f4c7d>] [<ffffffff814f4c7d>] skb_release_data+0x8d/0x110 May 4 22:00:22 sirius kernel: [1999361.456294] RSP: e02b:ffffc90041bcfc78 EFLAGS: 00010206 May 4 22:00:22 sirius kernel: [1999361.460758] RAX: 0000000000000030 RBX: 00000000000001c0 RCX: ffffffff81ce0e00 May 4 22:00:22 sirius kernel: [1999361.465261] RDX: 0000000000008100 RSI: ffff880118a94f00 RDI: ffff880118a94f00 May 4 22:00:22 sirius kernel: [1999361.469724] RBP: ffff880118a94f00 R08: ffff88011bc00000 R09: ffff8800b0218000 May 4 22:00:22 sirius kernel: [1999361.474230] R10: ffff880118a94f00 R11: ffff880111c50240 R12: 0000000000000000 May 4 22:00:22 sirius kernel: [1999361.478710] R13: ffff8800b021fec0 R14: ffff8800b8356a40 R15: 0000000000000000 May 4 22:00:22 sirius kernel: [1999361.483220] FS: 00007faa54946700(0000) GS:ffff880120280000(0000) knlGS:0000000000000000 May 4 22:00:22 sirius kernel: [1999361.487736] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 May 4 22:00:22 sirius kernel: [1999361.492181] CR2: 00000000000001e0 CR3: 0000000119ce9000 CR4: 0000000000002660 May 4 22:00:22 sirius kernel: [1999361.496661] Stack: May 4 22:00:22 sirius kernel: [1999361.501036] ffff8801190a2800 ffff880118a94f00 ffffc90041bcfec0 0000000000007eac May 4 22:00:22 sirius kernel: [1999361.505470] ffffffff814f5267 ffff8801190a2800 ffff880118a94f00 ffffffff814fbc91 May 4 22:00:22 sirius kernel: [1999361.509844] ffff8801190a2800 ffffffff8153e2bf ffffc90000000000 ffff880118a94f00 May 4 22:00:22 sirius kernel: [1999361.514213] Call Trace: May 4 22:00:22 sirius kernel: [1999361.518499] [<ffffffff814f5267>] ? consume_skb+0x27/0x80 May 4 22:00:22 sirius kernel: [1999361.522818] [<ffffffff814fbc91>] ? skb_free_datagram+0x11/0x40 May 4 22:00:22 sirius kernel: [1999361.527109] [<ffffffff8153e2bf>] ? netlink_recvmsg+0x19f/0x440 May 4 22:00:22 sirius kernel: [1999361.531314] [<ffffffff814ec4ca>] ? ___sys_recvmsg+0xda/0x1f0 May 4 22:00:22 sirius kernel: [1999361.535488] [<ffffffff812221ab>] ? file_update_time+0xcb/0x110 May 4 22:00:22 sirius kernel: [1999361.539626] [<ffffffff8120e5cb>] ? pipe_write+0x29b/0x3e0 May 4 22:00:22 sirius kernel: [1999361.543790] [<ffffffff812050a0>] ? new_sync_write+0xe0/0x130 May 4 22:00:22 sirius kernel: [1999361.547989] [<ffffffff814ecf4e>] ? __sys_recvmsg+0x4e/0x90 May 4 22:00:22 sirius kernel: [1999361.552218] [<ffffffff81003b7f>] ? do_syscall_64+0x8f/0xf0 May 4 22:00:22 sirius kernel: [1999361.556467] [<ffffffff816113b8>] ? entry_SYSCALL_64_after_swapgs+0x42/0xb0 May 4 22:00:22 sirius kernel: [1999361.560791] Code: 03 48 c1 e8 37 83 e0 07 83 f8 04 74 49 41 0f b6 45 00 41 83 c4 01 44 39 e0 7e 51 49 63 c4 48 83 c0 03 48 c1 e0 04 49 8b 5c 05 00 <48> 8b 43 20 48 8d 50 ff a8 01 48 0f 45 da f0 ff 4b 1c 75 bf 48 May 4 22:00:22 sirius kernel: [1999361.570202] RIP [<ffffffff814f4c7d>] skb_release_data+0x8d/0x110 May 4 22:00:22 sirius kernel: [1999361.575033] RSP <ffffc90041bcfc78> May 4 22:00:22 sirius kernel: [1999361.579731] CR2: 00000000000001e0 May 4 22:00:22 sirius kernel: [1999361.599233] ---[ end trace de6345fc470c5362 ]--- May 18 13:49:26 omega kernel: [1213243.942643] general protection fault: 0000 [#1] SMP May 18 13:49:26 omega kernel: [1213243.946704] Modules linked in: xt_physdev br_netfilter bridge stp llc xen_netback xen_blkback algif_skcipher af_alg dm_service_time xen_gntdev openvswitch xen_evtchn nf_nat_ipv6 libcrc32c xenfs xen_privcmd ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_owner xt_multiport xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw dm_crypt amdkfd radeon intel_powerclamp crct10dif_pclmul iTCO_wdt crc32_pclmul iTCO_vendor_support ttm ghash_clmulni_intel hpwdt pcspkr drm_kms_helper drm serio_raw evdev i2c_algo_bit joydev sg hpilo lpc_ich mfd_core i7core_edac ipmi_si edac_core ipmi_msghandler acpi_power_meter shpchp button dm_multipath May 18 13:49:26 omega kernel: [1213243.973478] dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache btrfs crc32c_generic xor raid6_pq mlx4_en ptp pps_core hid_generic usbhid hid sd_mod crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse uhci_hcd ehci_pci ehci_hcd usbcore usb_common hpsa bnx2 mlx4_core scsi_transport_sas devlink scsi_mod thermal May 18 13:49:26 omega kernel: [1213243.997290] CPU: 2 PID: 1582 Comm: revalidator9 Not tainted 4.9.0-6-amd64 #1 Debian 4.9.88-1 May 18 13:49:26 omega kernel: [1213244.002200] Hardware name: HP ProLiant DL360 G7, BIOS P68 08/16/2015 May 18 13:49:26 omega kernel: [1213244.007157] task: ffff8801186caf00 task.stack: ffffc90041b8c000 May 18 13:49:26 omega kernel: [1213244.012040] RIP: e030:[<ffffffff814f5c7d>] [<ffffffff814f5c7d>] skb_release_data+0x8d/0x110 May 18 13:49:26 omega kernel: [1213244.016957] RSP: e02b:ffffc90041b8fc78 EFLAGS: 00010206 May 18 13:49:26 omega kernel: [1213244.021783] RAX: 0000000000000030 RBX: 290008a753b675a9 RCX: ffffffff81ce0f40 May 18 13:49:26 omega kernel: [1213244.026673] RDX: 0000000000008100 RSI: ffff88011a2c6200 RDI: ffff88011a2c6200 May 18 13:49:26 omega kernel: [1213244.031596] RBP: ffff88011a2c6200 R08: ffff88011bc00000 R09: ffff88011aa70000 May 18 13:49:26 omega kernel: [1213244.036422] R10: ffff88011a2c6200 R11: ffff8801186c0200 R12: 0000000000000000 May 18 13:49:26 omega kernel: [1213244.041267] R13: ffff88011aa77ec0 R14: ffff8801199da7c0 R15: 0000000000000000 May 18 13:49:26 omega kernel: [1213244.046055] FS: 00007fe5f35e2700(0000) GS:ffff880120280000(0000) knlGS:0000000000000000 May 18 13:49:26 omega kernel: [1213244.050785] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 May 18 13:49:26 omega kernel: [1213244.055488] CR2: 00007fe579f4f059 CR3: 0000000117428000 CR4: 0000000000002660 May 18 13:49:26 omega kernel: [1213244.060232] Stack: May 18 13:49:26 omega kernel: [1213244.064896] ffff880117588800 ffff88011a2c6200 ffffc90041b8fec0 0000000000007e94 May 18 13:49:26 omega kernel: [1213244.069725] ffffffff814f6267 ffff880117588800 ffff88011a2c6200 ffffffff814fcc91 May 18 13:49:26 omega kernel: [1213244.074552] ffff880117588800 ffffffff8153f2df ffffc90000000000 ffff88011a2c6200 May 18 13:49:26 omega kernel: [1213244.079377] Call Trace: May 18 13:49:26 omega kernel: [1213244.084123] [<ffffffff814f6267>] ? consume_skb+0x27/0x80 May 18 13:49:26 omega kernel: [1213244.089047] [<ffffffff814fcc91>] ? skb_free_datagram+0x11/0x40 May 18 13:49:26 omega kernel: [1213244.093728] [<ffffffff8153f2df>] ? netlink_recvmsg+0x19f/0x440 May 18 13:49:26 omega kernel: [1213244.098359] [<ffffffff814ed4ca>] ? ___sys_recvmsg+0xda/0x1f0 May 18 13:49:26 omega kernel: [1213244.102962] [<ffffffff812237fb>] ? file_update_time+0xcb/0x110 May 18 13:49:26 omega kernel: [1213244.107530] [<ffffffff8120fbeb>] ? pipe_write+0x29b/0x3e0 May 18 13:49:26 omega kernel: [1213244.112074] [<ffffffff812066b0>] ? new_sync_write+0xe0/0x130 May 18 13:49:26 omega kernel: [1213244.116625] [<ffffffff814edf4e>] ? __sys_recvmsg+0x4e/0x90 May 18 13:49:26 omega kernel: [1213244.121183] [<ffffffff81003b7d>] ? do_syscall_64+0x8d/0xf0 May 18 13:49:26 omega kernel: [1213244.125715] [<ffffffff8161244e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6 May 18 13:49:26 omega kernel: [1213244.130196] Code: 03 48 c1 e8 37 83 e0 07 83 f8 04 74 49 41 0f b6 45 00 41 83 c4 01 44 39 e0 7e 51 49 63 c4 48 83 c0 03 48 c1 e0 04 49 8b 5c 05 00 <48> 8b 43 20 48 8d 50 ff a8 01 48 0f 45 da f0 ff 4b 1c 75 bf 48 May 18 13:49:26 omega kernel: [1213244.139830] RIP [<ffffffff814f5c7d>] skb_release_data+0x8d/0x110 May 18 13:49:26 omega kernel: [1213244.144491] RSP <ffffc90041b8fc78> May 18 13:49:26 omega kernel: [1213244.164037] ---[ end trace c53e06696e145c33 ]---

