Hi all @misc, 1st things 1st : sorry for my long description, but :
after upgrading from 6.3-stable to 6.4-stable (and later also current) in our integration stage, I've met a strange problem. I run OpenBSD in a hub-and-spoke vpn architecture in round about 14 distributed datacenters. 6.3-stable is running fine and stable as expected. (all versions 6.3-stable, 6.4-stable and current are running as resflash-image) All locations - including the mentioned integration stage - are running with the same setup. Each location have two OpenBSD server/gateways, that run: - ospf over gre over ipsec -- local to each other and to our two main datacenters (hub) - two bridge-interfaces inside one server -- one for tagged frames, one for untagged -- both bridge-interfaces are connected with a pair-interface -- first server is configured as primary within ospf,stp and carp - layer-2 redundancy is done by stp on the openbsd-side and mstp (instance 0) on the network-gear-side - layer-3 redundancy is done by ospf and carp - pf is enabled The problem can be described as follows : after an initial boot, everything is working fine for round about 4 hours. After 4 hours, it is not possible to login into the backup/secondary openbsd-server via ssh or even via serial console, but it seems to still forward traffic correctly. Also the ospf adjacencies are up&running as well as ipsec security associations and so on. Monitoring metrics doesn't show any meassured increase of any data. I've already exchanged the hardware, because it was my first guess, as the first server/gateway is running without any problems with the same 6.4-stable and config version - but this unfortunately didn't help. When I left an serial console login opened, I was able to execute some commands and also a top, I've invoked before, was still running at the failure-state. But when entering e.g. ifconfig, or trying a tab-completion also the serial console freezes. The problem will not occur, if I : - shutdown bridge0 (for tagged frames) or - shutdown bridge1 (for untagged frames) or - shutdown pair0 or pair1 (interconnection between the bridges) Please find attached the commands I was able to execute before tab-completion or ifconfig in this case : ---cut--- # df -i Filesystem 512-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/sd0e 3473724 1127852 2172188 34% 14494 219360 6% / mfs:64049 63326 12 60148 0% 7 8183 0% /tmp mfs:51486 11391 63 10759 1% 1231 1839 40% /dev mfs:86629 63326 8552 51608 14% 365 7825 4% /etc mfs:35143 253790 11512 229590 5% 236 32530 1% /var mfs:6765 253790 76506 164596 32% 45 32721 0% /usr/lib mfs:9627 253790 6132 234970 3% 66 32700 0% /usr/libexec # # vmstat 1 10 procs memory page disks traps cpu r s avm fre flt re pi po fr sr sd0 sd1 int sys cs us sy id 0 64 104M 7474M 19 0 0 0 0 0 1 0 73 68 168 0 0 100 0 64 104M 7474M 20 0 0 0 0 0 0 0 66 60 128 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 48 45 92 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 73 44 146 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 65 47 132 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 37 49 82 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 52 44 107 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 51 44 106 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 52 44 104 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 53 47 118 0 0 100 # # iostat 1 10 tty sd0 sd1 cpu tin tout KB/t t/s MB/s KB/t t/s MB/s us ni sy sp in id 0 2 28.82 0 0.01 0.50 0 0.00 0 0 0 0 0100 0 193 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 # # df -h Filesystem Size Used Avail Capacity Mounted on /dev/sd0e 1.7G 551M 1.0G 34% / mfs:69819 30.9M 9.0K 29.4M 0% /tmp mfs:9236 5.6M 31.5K 5.3M 1% /dev mfs:34613 30.9M 4.2M 25.2M 14% /etc mfs:46616 124M 5.5M 112M 5% /var mfs:42566 124M 37.4M 80.4M 32% /usr/lib mfs:90305 124M 3.0M 115M 3% /usr/libexec # #ifconfig ^C ---cut--- In dmesg I see a few entries on both machines like that : ---cut--- splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 ---cut--- ---cut--- $ dmesg OpenBSD 6.4-current (GENERIC.MP) #0: Thu Dec 27 23:54:55 MST 2018 [email protected]:/sys/arch/amd64/compile/GENERIC.MP real mem = 8476962816 (8084MB) avail mem = 8210743296 (7830MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec200 (77 entries) bios0: vendor American Megatrends Inc. version "4.6.5" date 03/02/2015 bios0: INTEL Corporation DENLOW_WS acpi0 at bios0: rev 2 acpi0: sleep states S0 S5 acpi0: tables DSDT FACP APIC FPDT SSDT MCFG HPET SSDT SSDT ASF! DMAR EINJ ERST HEST BERT acpi0: wakeup devices PXSX(S0) RP01(S0) PXSX(S0) RP02(S0) PXSX(S0) RP03(S0) PXSX(S0) RP04(S0) PXSX(S0) RP05(S0) PXSX(S0) RP06(S0) PXSX(S0) RP07(S0) PXSX(S0) GLAN(S0) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz, 2900.36 MHz, 06-3c-03 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 100MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz, 2900.00 MHz, 06-3c-03 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz, 2900.00 MHz, 06-3c-03 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz, 2900.00 MHz, 06-3c-03 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu3: 256KB 64b/line 8-way L2 cache cpu3: smt 0, core 3, package 0 ioapic0 at mainbus0: apid 8 pa 0xfec00000, version 20, 24 pins acpimcfg0 at acpi0 acpimcfg0: addr 0xf8000000, bus 0-63 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 4 (RP01) acpiprt2 at acpi0: bus 5 (RP02) acpiprt3 at acpi0: bus 6 (RP03) acpiprt4 at acpi0: bus 7 (RP04) acpiprt5 at acpi0: bus 8 (RP05) acpiprt6 at acpi0: bus 9 (RP06) acpiprt7 at acpi0: bus 10 (RP07) acpiprt8 at acpi0: bus 1 (PEG0) acpiprt9 at acpi0: bus 2 (PEG1) acpiprt10 at acpi0: bus 3 (PEG2) acpiec0 at acpi0: not present acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) acpicpu2 at acpi0: C1(@1 halt!) acpicpu3 at acpi0: C1(@1 halt!) acpipwrres0 at acpi0: FN00, resource for FAN0 acpipwrres1 at acpi0: FN01, resource for FAN1 acpipwrres2 at acpi0: FN02, resource for FAN2 acpipwrres3 at acpi0: FN03, resource for FAN3 acpipwrres4 at acpi0: FN04, resource for FAN4 acpitz0 at acpi0: critical temperature is 105 degC acpitz1 at acpi0: critical temperature is 105 degC acpipci0 at acpi0 PCI0: 0x00000010 0x00000011 0x00000000 acpicmos0 at acpi0 acpibtn0 at acpi0: PWRB "PNP0C0B" at acpi0 not configured "PNP0C0B" at acpi0 not configured "PNP0C0B" at acpi0 not configured "PNP0C0B" at acpi0 not configured "PNP0C0B" at acpi0 not configured acpivideo0 at acpi0: GFX0 acpivout0 at acpivideo0: DD1F pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel Core 4G Host" rev 0x06 ppb0 at pci0 dev 1 function 0 "Intel Core 4G PCIE" rev 0x06: msi pci1 at ppb0 bus 1 ppb1 at pci0 dev 1 function 1 "Intel Core 4G PCIE" rev 0x06: msi pci2 at ppb1 bus 2 em0 at pci2 dev 0 function 0 "Intel I350 Fiber" rev 0x01: msi, address 00:90:0b:52:6e:a4 em1 at pci2 dev 0 function 1 "Intel I350 Fiber" rev 0x01: msi, address 00:90:0b:52:6e:a5 em2 at pci2 dev 0 function 2 "Intel I350 Fiber" rev 0x01: msi, address 00:90:0b:52:6e:a6 em3 at pci2 dev 0 function 3 "Intel I350 Fiber" rev 0x01: msi, address 00:90:0b:52:6e:a7 ppb2 at pci0 dev 1 function 2 "Intel Core 4G PCIE" rev 0x06: msi pci3 at ppb2 bus 3 em4 at pci3 dev 0 function 0 "Intel I350 Fiber" rev 0x01: msi, address 00:90:0b:52:6e:a8 em5 at pci3 dev 0 function 1 "Intel I350 Fiber" rev 0x01: msi, address 00:90:0b:52:6e:a9 em6 at pci3 dev 0 function 2 "Intel I350 Fiber" rev 0x01: msi, address 00:90:0b:52:6e:aa em7 at pci3 dev 0 function 3 "Intel I350 Fiber" rev 0x01: msi, address 00:90:0b:52:6e:ab inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4600" rev 0x06 drm0 at inteldrm0 inteldrm0: msi inteldrm0: 1024x768, 32bpp wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation) wsdisplay0: screen 1-5 added (std, vt100 emulation) "Intel 8 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured puc0 at pci0 dev 22 function 3 "Intel 8 Series KT" rev 0x04: ports: 16 com com4 at puc0 port 0 apic 8 int 19: ns16550a, 16 byte fifo com4: probed fifo depth: 0 bytes em8 at pci0 dev 25 function 0 "Intel I217-LM" rev 0x05: msi, address 00:90:0b:4f:5d:e3 ehci0 at pci0 dev 26 function 0 "Intel 8 Series USB" rev 0x05: apic 8 int 16 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 ppb3 at pci0 dev 28 function 0 "Intel 8 Series PCIE" rev 0xd5: msi pci4 at ppb3 bus 4 em9 at pci4 dev 0 function 0 "Intel I210" rev 0x03: msi, address 00:90:0b:4f:5d:dc ppb4 at pci0 dev 28 function 1 "Intel 8 Series PCIE" rev 0xd5: msi pci5 at ppb4 bus 5 em10 at pci5 dev 0 function 0 "Intel I210" rev 0x03: msi, address 00:90:0b:4f:5d:dd ppb5 at pci0 dev 28 function 2 "Intel 8 Series PCIE" rev 0xd5: msi pci6 at ppb5 bus 6 em11 at pci6 dev 0 function 0 "Intel I210" rev 0x03: msi, address 00:90:0b:4f:5d:de ppb6 at pci0 dev 28 function 3 "Intel 8 Series PCIE" rev 0xd5: msi pci7 at ppb6 bus 7 em12 at pci7 dev 0 function 0 "Intel I210" rev 0x03: msi, address 00:90:0b:4f:5d:df ppb7 at pci0 dev 28 function 4 "Intel 8 Series PCIE" rev 0xd5: msi pci8 at ppb7 bus 8 em13 at pci8 dev 0 function 0 "Intel I210" rev 0x03: msi, address 00:90:0b:4f:5d:e0 ppb8 at pci0 dev 28 function 5 "Intel 8 Series PCIE" rev 0xd5: msi pci9 at ppb8 bus 9 em14 at pci9 dev 0 function 0 "Intel I210" rev 0x03: msi, address 00:90:0b:4f:5d:e1 ppb9 at pci0 dev 28 function 6 "Intel 8 Series PCIE" rev 0xd5: msi pci10 at ppb9 bus 10 em15 at pci10 dev 0 function 0 "Intel I210" rev 0x03: msi, address 00:90:0b:4f:5d:e2 ehci1 at pci0 dev 29 function 0 "Intel 8 Series USB" rev 0x05: apic 8 int 23 usb1 at ehci1: USB revision 2.0 uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 pcib0 at pci0 dev 31 function 0 "Intel C226 LPC" rev 0x05 ahci0 at pci0 dev 31 function 2 "Intel 8 Series AHCI" rev 0x05: msi, AHCI 1.3 ahci0: port 0: 1.5Gb/s ahci0: port 1: 3.0Gb/s scsibus1 at ahci0: 32 targets sd0 at scsibus1 targ 0 lun 0: <ATA, TS8GCF220I, 2015> SCSI3 0/direct fixed t10.ATA_TS8GCF220I_D13852A12012C6000070 sd0: 7775MB, 512 bytes/sector, 15924384 sectors sd1 at scsibus1 targ 1 lun 0: <ATA, WDC WD5000LUCT-6, 01.0> SCSI3 0/direct fixed naa.50014ee65c627813 sd1: 476940MB, 512 bytes/sector, 976773168 sectors ichiic0 at pci0 dev 31 function 3 "Intel 8 Series SMBus" rev 0x05: apic 8 int 18 iic0 at ichiic0 spdmem0 at iic0 addr 0x50: 8GB DDR3 SDRAM PC3-12800 isa0 at pcib0 isadma0 at isa0 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo com0: console com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo pckbc0 at isa0 port 0x60/5 irq 1 irq 12 pckbd0 at pckbc0 (kbd slot) wskbd0 at pckbd0: console keyboard, using wsdisplay0 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 lpt0 at isa0 port 0x378/4 irq 7 wbsio0 at isa0 port 0x2e/2: NCT6776F rev 0x33 lm1 at wbsio0 port 0xa30/8: NCT6776F vmm0 at mainbus0: VMX/EPT (using slow L1TF mitigation) uhub2 at uhub0 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.05 addr 2 uhub3 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.05 addr 2 vscsi0 at root scsibus2 at vscsi0: 256 targets softraid0 at root scsibus3 at softraid0: 256 targets root on sd0e (9f97b8d42ceedbf4.e) swap on sd0b dump on sd0b carp0: state transition: BACKUP -> MASTER carp0: state transition: MASTER -> BACKUP splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 splassert: bstp_notify_rtage: want 2 have 0 $ ---cut--- And the corresponding interface configuration : ---cut--- $ cat /etc/hostname.bridge0 description L2-Trunk-Ports-with-RSTP-and-VLAN42-parent-IF add em9 add em10 add em15 add vether0 add vether1 add pair0 stp em9 stp em10 stp em15 stp vether0 stp vether1 stp pair0 spanpriority 16384 maxaddr 500 timeout 5 up $ cat /etc/hostname.bridge1 description L2-Access-Ports-in-VLAN42 add em12 add em13 add em14 add vlan4242 stp em12 stp em13 stp em14 stp vlan4242 spanpriority 16384 maxaddr 500 timeout 5 up $ cat /etc/hostname.pair0 up $ cat /etc/hostname.pair1 up patch pair0 $ cat /etc/hostname.vlan42 vlan 42 vlandev vether0 172.16.0.2/22 up $ cat /etc/hostname.vether0 up $ cat /etc/hostname.vlan4242 vlan 42 vlandev pair1 up $ ---cut--- Any ideas how to fix / mitigate / debug this problem further ? Best regards, Marco

