Hi all @misc,

1st things 1st : sorry for my long description, but :

after upgrading from 6.3-stable to 6.4-stable (and later also current)
in our integration stage, I've met a strange problem.

I run OpenBSD in a hub-and-spoke vpn architecture in round about 14
distributed datacenters.

6.3-stable is running fine and stable as expected.

(all versions 6.3-stable, 6.4-stable and current are running as
resflash-image)



All locations - including the mentioned integration stage - are running
with the same setup.

Each location have two OpenBSD server/gateways, that run:


- ospf over gre over ipsec

-- local to each other and to our two main datacenters (hub)


- two bridge-interfaces inside one server

-- one for tagged frames, one for untagged

-- both bridge-interfaces are connected with a pair-interface

-- first server is configured as primary within ospf,stp and carp


- layer-2 redundancy is done by stp on the openbsd-side and mstp
(instance 0) on the network-gear-side


- layer-3 redundancy is done by ospf and carp


- pf is enabled



The problem can be described as follows :

after an initial boot, everything is working fine for round about 4 hours.

After 4 hours, it is not possible to login into the backup/secondary
openbsd-server via ssh or even via serial console, but it seems to still
forward traffic correctly. Also the ospf adjacencies are up&running as
well as ipsec security associations and so on.

Monitoring metrics doesn't show any meassured increase of any data.

I've already exchanged the hardware, because it was my first guess, as
the first server/gateway is running without any problems with the same
6.4-stable and config version - but this unfortunately didn't help.

When I left an serial console login opened, I was able to execute some
commands and also a top, I've invoked before, was still running at the
failure-state. But when entering e.g. ifconfig, or trying a
tab-completion also the serial console freezes.


The problem will not occur, if I :


- shutdown bridge0 (for tagged frames)

or

- shutdown bridge1 (for untagged frames)

or

- shutdown pair0 or pair1 (interconnection between the bridges)



Please find attached the commands I was able to execute before
tab-completion or ifconfig in this case :

---cut---

# df -i
Filesystem  512-blocks      Used     Avail Capacity iused   ifree 
%iused  Mounted on 
/dev/sd0e      3473724   1127852   2172188    34%   14494  219360    
6%   /          
mfs:64049        63326        12     60148     0%       7    8183    
0%   /tmp       
mfs:51486        11391        63     10759     1%    1231    1839   
40%   /dev       
mfs:86629        63326      8552     51608    14%     365    7825    
4%   /etc       
mfs:35143       253790     11512    229590     5%     236   32530    
1%   /var       
mfs:6765        253790     76506    164596    32%      45   32721    
0%   /usr/lib   
mfs:9627        253790      6132    234970     3%      66   32700    
0%   /usr/libexec

#

# vmstat 1 10
 procs    memory       page                    disks    traps         
cpu            
 r   s   avm     fre  flt  re  pi  po  fr  sr sd0 sd1  int   sys   cs us
sy id        
 0  64  104M   7474M   19   0   0   0   0   0   1   0   73    68  168 
0  0 100       
 0  64  104M   7474M   20   0   0   0   0   0   0   0   66    60  128 
0  0 100       
 0  64  104M   7474M   12   0   0   0   0   0   0   0   48    45   92 
0  0 100       
 0  64  104M   7474M   12   0   0   0   0   0   0   0   73    44  146 
0  0 100       
 0  64  104M   7474M   12   0   0   0   0   0   0   0   65    47  132 
0  0 100       
 0  64  104M   7474M   12   0   0   0   0   0   0   0   37    49   82 
0  0 100       
 0  64  104M   7474M   12   0   0   0   0   0   0   0   52    44  107 
0  0 100       
 0  64  104M   7474M   12   0   0   0   0   0   0   0   51    44  106 
0  0 100       
 0  64  104M   7474M   12   0   0   0   0   0   0   0   52    44  104 
0  0 100       
 0  64  104M   7474M   12   0   0   0   0   0   0   0   53    47  118 
0  0 100       
#
# iostat 1 10
      tty              sd0               sd1                cpu
 tin tout  KB/t  t/s  MB/s   KB/t  t/s  MB/s  us ni sy sp in id
   0    2 28.82    0  0.01   0.50    0  0.00   0  0  0  0  0100
   0  193  0.00    0  0.00   0.00    0  0.00   0  0  0  0  0100
   0   64  0.00    0  0.00   0.00    0  0.00   0  0  0  0  0100
   0   64  0.00    0  0.00   0.00    0  0.00   0  0  0  0  0100
   0   64  0.00    0  0.00   0.00    0  0.00   0  0  0  0  0100
   0   64  0.00    0  0.00   0.00    0  0.00   0  0  0  0  0100
   0   64  0.00    0  0.00   0.00    0  0.00   0  0  0  0  0100
   0   64  0.00    0  0.00   0.00    0  0.00   0  0  0  0  0100
   0   64  0.00    0  0.00   0.00    0  0.00   0  0  0  0  0100
   0   64  0.00    0  0.00   0.00    0  0.00   0  0  0  0  0100
#

# df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/sd0e      1.7G    551M    1.0G    34%    /
mfs:69819     30.9M    9.0K   29.4M     0%    /tmp
mfs:9236       5.6M   31.5K    5.3M     1%    /dev
mfs:34613     30.9M    4.2M   25.2M    14%    /etc
mfs:46616      124M    5.5M    112M     5%    /var
mfs:42566      124M   37.4M   80.4M    32%    /usr/lib
mfs:90305      124M    3.0M    115M     3%    /usr/libexec
#
#ifconfig

                                                                             

^C

---cut---


In dmesg I see a few entries on both machines like that :

---cut---

splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
---cut---


---cut---

$
dmesg                                                                           
                                                                                
                                                                          

OpenBSD 6.4-current (GENERIC.MP) #0: Thu Dec 27 23:54:55 MST 2018
    [email protected]:/sys/arch/amd64/compile/GENERIC.MP
real mem = 8476962816 (8084MB)
avail mem = 8210743296 (7830MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec200 (77 entries)
bios0: vendor American Megatrends Inc. version "4.6.5" date 03/02/2015
bios0: INTEL Corporation DENLOW_WS
acpi0 at bios0: rev 2
acpi0: sleep states S0 S5
acpi0: tables DSDT FACP APIC FPDT SSDT MCFG HPET SSDT SSDT ASF! DMAR
EINJ ERST HEST BERT
acpi0: wakeup devices PXSX(S0) RP01(S0) PXSX(S0) RP02(S0) PXSX(S0)
RP03(S0) PXSX(S0) RP04(S0) PXSX(S0) RP05(S0) PXSX(S0) RP06(S0) PXSX(S0)
RP07(S0) PXSX(S0) GLAN(S0) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz, 2900.36 MHz, 06-3c-03
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz, 2900.00 MHz, 06-3c-03
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz, 2900.00 MHz, 06-3c-03
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz, 2900.00 MHz, 06-3c-03
cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 0, core 3, package 0
ioapic0 at mainbus0: apid 8 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xf8000000, bus 0-63
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 4 (RP01)
acpiprt2 at acpi0: bus 5 (RP02)
acpiprt3 at acpi0: bus 6 (RP03)
acpiprt4 at acpi0: bus 7 (RP04)
acpiprt5 at acpi0: bus 8 (RP05)
acpiprt6 at acpi0: bus 9 (RP06)
acpiprt7 at acpi0: bus 10 (RP07)
acpiprt8 at acpi0: bus 1 (PEG0)
acpiprt9 at acpi0: bus 2 (PEG1)
acpiprt10 at acpi0: bus 3 (PEG2)
acpiec0 at acpi0: not present
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpicpu2 at acpi0: C1(@1 halt!)
acpicpu3 at acpi0: C1(@1 halt!)
acpipwrres0 at acpi0: FN00, resource for FAN0
acpipwrres1 at acpi0: FN01, resource for FAN1
acpipwrres2 at acpi0: FN02, resource for FAN2
acpipwrres3 at acpi0: FN03, resource for FAN3
acpipwrres4 at acpi0: FN04, resource for FAN4
acpitz0 at acpi0: critical temperature is 105 degC
acpitz1 at acpi0: critical temperature is 105 degC
acpipci0 at acpi0 PCI0: 0x00000010 0x00000011 0x00000000
acpicmos0 at acpi0
acpibtn0 at acpi0: PWRB
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
"PNP0C0B" at acpi0 not configured
acpivideo0 at acpi0: GFX0
acpivout0 at acpivideo0: DD1F
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 4G Host" rev 0x06
ppb0 at pci0 dev 1 function 0 "Intel Core 4G PCIE" rev 0x06: msi
pci1 at ppb0 bus 1
ppb1 at pci0 dev 1 function 1 "Intel Core 4G PCIE" rev 0x06: msi
pci2 at ppb1 bus 2
em0 at pci2 dev 0 function 0 "Intel I350 Fiber" rev 0x01: msi, address
00:90:0b:52:6e:a4
em1 at pci2 dev 0 function 1 "Intel I350 Fiber" rev 0x01: msi, address
00:90:0b:52:6e:a5
em2 at pci2 dev 0 function 2 "Intel I350 Fiber" rev 0x01: msi, address
00:90:0b:52:6e:a6
em3 at pci2 dev 0 function 3 "Intel I350 Fiber" rev 0x01: msi, address
00:90:0b:52:6e:a7
ppb2 at pci0 dev 1 function 2 "Intel Core 4G PCIE" rev 0x06: msi
pci3 at ppb2 bus 3
em4 at pci3 dev 0 function 0 "Intel I350 Fiber" rev 0x01: msi, address
00:90:0b:52:6e:a8
em5 at pci3 dev 0 function 1 "Intel I350 Fiber" rev 0x01: msi, address
00:90:0b:52:6e:a9
em6 at pci3 dev 0 function 2 "Intel I350 Fiber" rev 0x01: msi, address
00:90:0b:52:6e:aa
em7 at pci3 dev 0 function 3 "Intel I350 Fiber" rev 0x01: msi, address
00:90:0b:52:6e:ab
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4600" rev 0x06
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1024x768, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"Intel 8 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
puc0 at pci0 dev 22 function 3 "Intel 8 Series KT" rev 0x04: ports: 16 com
com4 at puc0 port 0 apic 8 int 19: ns16550a, 16 byte fifo
com4: probed fifo depth: 0 bytes
em8 at pci0 dev 25 function 0 "Intel I217-LM" rev 0x05: msi, address
00:90:0b:4f:5d:e3
ehci0 at pci0 dev 26 function 0 "Intel 8 Series USB" rev 0x05: apic 8 int 16
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev
2.00/1.00 addr 1
ppb3 at pci0 dev 28 function 0 "Intel 8 Series PCIE" rev 0xd5: msi
pci4 at ppb3 bus 4
em9 at pci4 dev 0 function 0 "Intel I210" rev 0x03: msi, address
00:90:0b:4f:5d:dc
ppb4 at pci0 dev 28 function 1 "Intel 8 Series PCIE" rev 0xd5: msi
pci5 at ppb4 bus 5
em10 at pci5 dev 0 function 0 "Intel I210" rev 0x03: msi, address
00:90:0b:4f:5d:dd
ppb5 at pci0 dev 28 function 2 "Intel 8 Series PCIE" rev 0xd5: msi
pci6 at ppb5 bus 6
em11 at pci6 dev 0 function 0 "Intel I210" rev 0x03: msi, address
00:90:0b:4f:5d:de
ppb6 at pci0 dev 28 function 3 "Intel 8 Series PCIE" rev 0xd5: msi
pci7 at ppb6 bus 7
em12 at pci7 dev 0 function 0 "Intel I210" rev 0x03: msi, address
00:90:0b:4f:5d:df
ppb7 at pci0 dev 28 function 4 "Intel 8 Series PCIE" rev 0xd5: msi
pci8 at ppb7 bus 8
em13 at pci8 dev 0 function 0 "Intel I210" rev 0x03: msi, address
00:90:0b:4f:5d:e0
ppb8 at pci0 dev 28 function 5 "Intel 8 Series PCIE" rev 0xd5: msi
pci9 at ppb8 bus 9
em14 at pci9 dev 0 function 0 "Intel I210" rev 0x03: msi, address
00:90:0b:4f:5d:e1
ppb9 at pci0 dev 28 function 6 "Intel 8 Series PCIE" rev 0xd5: msi
pci10 at ppb9 bus 10
em15 at pci10 dev 0 function 0 "Intel I210" rev 0x03: msi, address
00:90:0b:4f:5d:e2
ehci1 at pci0 dev 29 function 0 "Intel 8 Series USB" rev 0x05: apic 8 int 23
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev
2.00/1.00 addr 1
pcib0 at pci0 dev 31 function 0 "Intel C226 LPC" rev 0x05
ahci0 at pci0 dev 31 function 2 "Intel 8 Series AHCI" rev 0x05: msi,
AHCI 1.3
ahci0: port 0: 1.5Gb/s
ahci0: port 1: 3.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, TS8GCF220I, 2015> SCSI3 0/direct
fixed t10.ATA_TS8GCF220I_D13852A12012C6000070
sd0: 7775MB, 512 bytes/sector, 15924384 sectors
sd1 at scsibus1 targ 1 lun 0: <ATA, WDC WD5000LUCT-6, 01.0> SCSI3
0/direct fixed naa.50014ee65c627813
sd1: 476940MB, 512 bytes/sector, 976773168 sectors
ichiic0 at pci0 dev 31 function 3 "Intel 8 Series SMBus" rev 0x05: apic
8 int 18
iic0 at ichiic0
spdmem0 at iic0 addr 0x50: 8GB DDR3 SDRAM PC3-12800
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
wbsio0 at isa0 port 0x2e/2: NCT6776F rev 0x33
lm1 at wbsio0 port 0xa30/8: NCT6776F
vmm0 at mainbus0: VMX/EPT (using slow L1TF mitigation)
uhub2 at uhub0 port 1 configuration 1 interface 0 "Intel Rate Matching
Hub" rev 2.00/0.05 addr 2
uhub3 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching
Hub" rev 2.00/0.05 addr 2
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0e (9f97b8d42ceedbf4.e) swap on sd0b dump on sd0b
carp0: state transition: BACKUP -> MASTER
carp0: state transition: MASTER -> BACKUP
splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
splassert: bstp_notify_rtage: want 2 have 0
$
---cut---


And the corresponding interface configuration :

---cut---

$ cat /etc/hostname.bridge0
description L2-Trunk-Ports-with-RSTP-and-VLAN42-parent-IF
add em9
add em10
add em15
add vether0
add vether1
add pair0
stp em9
stp em10
stp em15
stp vether0
stp vether1
stp pair0
spanpriority 16384
maxaddr 500
timeout 5
up

$ cat /etc/hostname.bridge1
description L2-Access-Ports-in-VLAN42
add em12
add em13
add em14
add vlan4242
stp em12
stp em13
stp em14
stp vlan4242
spanpriority 16384
maxaddr 500
timeout 5
up


$ cat /etc/hostname.pair0
up

$ cat /etc/hostname.pair1
up
patch pair0

$ cat /etc/hostname.vlan42
vlan 42 vlandev vether0 172.16.0.2/22 up

$ cat /etc/hostname.vether0
up

$ cat /etc/hostname.vlan4242
vlan 42 vlandev pair1 up
$
---cut---



Any ideas how to fix / mitigate / debug this problem further ?


Best regards,

Marco




Reply via email to