Source: linux
Version: 4.9.168-1
Severity: important
X-Debbugs-Cc: [email protected], [email protected]
User: [email protected]
Usertags: needed-by-DSA-Team
Hi,
ever since the 9.9 point release conova-node01.debian.org and
conova-node02.debian.org have been unstable. They run for an hour or
three, and then things go bad. Rebooting back to 4.9.144-3.1 makes them
stable again.
Latest example:
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: PingAck did not arrive in time.
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure )
pdsk( UpToDate -> DUnknown )
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: new current UUID
3EA2D1FA6B3ACD47:0BEBDA613EA56FD7:D5BF70E0AA6560C5:D5BE70E0AA6560C5
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: ack_receiver terminated
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: Terminating drbd_a_resource
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: Connection closed
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: conn( NetworkFailure -> Unconnected )
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: receiver terminated
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: Restarting receiver thread
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: receiver (re)started
May 22 04:17:37 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: conn( Unconnected -> WFConnection )
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: Handshake successful: Agreed network protocol version 101
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC
WRITE_SAME.
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: Peer authenticated using 16 bytes HMAC
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: conn( WFConnection -> WFReportParams )
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: drbd
resource3: Starting ack_recv thread (from drbd_r_resource [8449])
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: drbd_sync_handshake:
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: self
3EA2D1FA6B3ACD47:0BEBDA613EA56FD7:D5BF70E0AA6560C5:D5BE70E0AA6560C5 bits:4
flags:0
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: peer
0BEBDA613EA56FD6:0000000000000000:D5BF70E0AA6560C4:D5BE70E0AA6560C5 bits:0
flags:0
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: uuid_compare()=1 by rule 70
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
pdsk( DUnknown -> Consistent )
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 28(1), total
28; compression: 100.0%
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 28(1),
total 28; compression: 100.0%
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: helper command: /bin/true before-resync-source minor-3
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: helper command: /bin/true before-resync-source minor-3 exit code 0
(0x0)
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: Began resync as SyncSource (will sync 16 KB [4 bits set]).
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: updated sync UUID
3EA2D1FA6B3ACD47:0BECDA613EA56FD7:0BEBDA613EA56FD7:D5BF70E0AA6560C5
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: Resync done (total 1 sec; paused 0 sec; 16 K/sec)
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: updated UUIDs
3EA2D1FA6B3ACD47:0000000000000000:0BECDA613EA56FD7:0BEBDA613EA56FD7
May 22 04:17:38 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
block drbd3: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
May 22 04:17:48 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi:
[Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI get_time
May 22 04:18:54 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi:
[Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI set_time
May 22 04:18:54 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: efi:
[Firmware Bug]: IRQ flags corrupted (0x00000140=>0x00000100) by EFI get_time
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Bad
mode in FIQ handler detected on CPU0, code 0x56000000 -- SVC (AArch64)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
Internal error: Oops - bad mode: 0 [#1] SMP
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
Modules linked in: openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat binfmt_misc
nls_ascii nls_cp437 vfat fat dm_mod ip6t_REJECT nf_reject_ipv6
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT
nf_reject_ipv4 xt_NFLOG nfnetlink_log nfnetlink xt_tcpudp nf_conntrack_ipv4
nf_defrag_ipv4 xt_hashlimit xt_multiport xt_conntrack nf_conntr
ack iptable_filter ast ttm drm_kms_helper xgene_hwmon efi_pstore drm
i2c_algo_bit xgene_edac edac_core xgene_dma joydev evdev chaoskey
mailbox_xgene_slimpro sg xgene_rng rng_core efivars tun drbd lru_cache efivarfs
ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
crc32c_generic libcrc32c raid0 multipath linear raid1 hid_generic md_mod usbhid
hid sd_mod
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
i2c_xgene_slimpro ahci_xgene libahci_platform libahci xhci_plat_hcd xgene_enet
xhci_hcd libata phy_xgene marvell usbcore scsi_mod mdio_xgene of_mdio fixed_phy
libphy usb_common gpio_xgene_sb
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: CPU:
0 PID: 1410 Comm: ovsdb-server Tainted: G W I 4.9.0-9-arm64 #1
Debian 4.9.168-1+deb9u2
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
Hardware name: GIGABYTE R120-P31/MP30-AR1, BIOS D7b 08/26/2016
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
task: ffff807ff9d54380 task.stack: ffff807f95c94000
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: PC
is at 0xffffa10dbf00
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: LR
is at 0xffffa13d221c
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: pc :
[<0000ffffa10dbf00>] lr : [<0000ffffa13d221c>] pstate: a0000000
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: sp :
0000fffff72e8970
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x29:
0000fffff72e8970 x28: 0000000000000000
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x27:
0000aaaafa714d90 x26: 0000aaaafa7354c8
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x25:
0000aaaafa6eaed0 x24: 0000000000000018
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x23:
0000aaaafa72c660 x22: 0000aaaafa711b80
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x21:
0000000000000004 x20: 000000000000000c
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x19:
0000aaaafa702b90 x18: 00000000002597a9
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x17:
0000ffffa10dbec0 x16: 0000ffffa14837a0
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x15:
ffffffffffffffff x14: 0000000000000010
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x13:
33613a63353a3834 x12: 3a66373a63613a36
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x11:
0101010101010101 x10: 0000000066666666
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x9 :
7f7f7f7f7f7f7f7f x8 : 0101010101010101
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x7 :
7f7fffffff7f7f7f x6 : feffa9a9f970ff72
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x5 :
8080000000008000 x4 : 0080000000008080
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x3 :
0000aaaafa720073 x2 : 726f7272655f7874
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: x1 :
0000aaaafa711c20 x0 : 0000000000000008
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
Process ovsdb-server (pid: 1410, stack limit = 0xffff807f95c94020)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: ---[
end trace 1fdaa7d4350a5508 ]---
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Bad
mode in FIQ handler detected on CPU0, code 0x56000000 -- SVC (AArch64)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
INFO: rcu_bh detected stalls on CPUs/tasks:
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
0-...: (1 GPs behind) idle=1fd/140000000000000/0 softirq=736283/736285 fqs=2434
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
(detected by 2, t=5255 jiffies, g=15038, c=15037, q=8)
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Task
dump for CPU 0:
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
ovsdb-server R running task 0 1410 1409 0x0000000a
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel: Call
trace:
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<ffff000008086190>] __switch_to+0x90/0xd8
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<ffff00000808b804>] bad_mode+0x6c/0x90
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<0000000021dc9afc>] 0x21dc9afc
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<0000000021db79b8>] 0x21db79b8
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<ffff000008610748>] virt_efi_set_variable.part.6+0x68/0xb0
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<ffff000008610898>] virt_efi_set_variable+0x78/0x90
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<ffff00000860f020>] efivar_entry_set_safe+0xc8/0x200
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<ffff0000010574b8>] efi_pstore_write+0x158/0x1b0 [efi_pstore]
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<ffff00000830cdbc>] pstore_dump+0x17c/0x388
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<ffff000008132a54>] kmsg_dump+0xac/0xd0
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<ffff0000080cf5cc>] oops_exit+0x2c/0x38
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<ffff00000808b0a4>] die+0xdc/0x1c8
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<ffff00000808b818>] bad_mode+0x80/0x90
May 22 04:23:51 conova-node01/conova-node01/::ffff:217.196.149.227 kernel:
[<0000ffffa13d221c>] 0xffffa13d221c
I don't know if the drbd stuff is related to the Oops, I guess it may
not be (as I see similar messages before things break). In any case
after that point the network is down. The network driver is xgene-enet.
/etc/network/interfaces:
# The loopback network interface
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet manual
pre-up echo 1 > /proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
pre-up ip link set dev $IFACE up
post-down ip link set dev $IFACE down
# The primary network interface
allow-hotplug br-inet
iface br-inet inet static
address 217.196.149.227/28
gateway 217.196.149.238
iface br-inet inet6 static
address 2a02:16a8:dc41:100::227/64
gateway 2a02:16a8:dc41:100::def
auto eth1
iface eth1 inet static
address 172.29.186.11/24
auto eth2
iface eth2 inet static
address 172.29.184.11/24
bridge config:
# ovs-vsctl show
91934a25-b86f-4d3a-a598-19f915404192
Bridge br-inet
Port "tap0"
Interface "tap0"
Port "eth0"
Interface "eth0"
Port br-inet
Interface br-inet
type: internal
Port "tap2"
Interface "tap2"
error: "could not open network device tap2 (No such device)"
Port "tap1"
Interface "tap1"
ovs_version: "2.6.2"
(the tap interfaces are for qemu VMs)
Cheers,
Julien