Ok, this smells like a buggy implementation of OSPF on the dot-com vendor-side. Upgrade of firmware on the both Cisco Nexus 3000-series to NXOS: version 7.0(3)I4(4) fixed my problem with ospf stuck in EXCHG/EXSTA.
Setup involving Dell switch shows following then ospfd on the obsd side is run with â-dvvvâ: spf_calc: area 0.0.0.0 calculated nbr_fsm: event HELLO_RECEIVED resulted in action START_INACTIVITY_TIMER and changing state for neighbor ID 10.4.255.26 from DOWN to INIT nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for neighbor ID 10.4.255.26 from INIT to 2-WAY if_fsm: event NEIGHBORCHANGE resulted in action NOTHING and changing state for interface trunk1 from WAIT to WAIT nbr_fsm: event HELLO_RECEIVED resulted in action START_INACTIVITY_TIMER and changing state for neighbor ID 10.4.255.29 from DOWN to INIT nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for neighbor ID 10.4.255.29 from INIT to 2-WAY if_fsm: event NEIGHBORCHANGE resulted in action NOTHING and changing state for interface trunk1 from WAIT to WAIT recv_db_description: neighbor ID 10.4.255.29: packet ignored in state 2-WAY if_act_elect: interface trunk1 old dr none new dr 10.4.255.29, old bdr none new bdr 10.4.255.26 nbr_fsm: event ADJ_OK resulted in action EVAL and changing state for neighbor ID 10.4.255.29 from 2-WAY to EXSTA nbr_fsm: event ADJ_OK resulted in action EVAL and changing state for neighbor ID 10.4.255.26 from 2-WAY to EXSTA orig_rtr_lsa: area 0.0.0.0 orig_rtr_lsa: stub net, interface trunk1 orig_rtr_lsa: area 0.0.0.0 orig_rtr_lsa: stub net, interface trunk1 if_fsm: event BACKUPSEEN resulted in action ELECT and changing state for interface trunk1 from WAIT to OTHER nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state for neighbor ID 10.4.255.29 from EXSTA to SNAP nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing state for neighbor ID 10.4.255.29 from SNAP to EXCHG recv_db_description: dupe from neighbor ID 10.4.255.29 recv_db_description: neighbor ID 10.4.255.29: seq num mismatch, bad flags nbr_fsm: event SEQ_NUM_MISMATCH resulted in action RESET_DD and changing state for neighbor ID 10.4.255.29 from EXCHG to EXSTA nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state for neighbor ID 10.4.255.29 from EXSTA to SNAP nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing state for neighbor ID 10.4.255.29 from SNAP to EXCHG recv_db_description: dupe from neighbor ID 10.4.255.29 recv_db_description: neighbor ID 10.4.255.29: seq num mismatch, bad flags Eg: recv_db_description: dupe from neighbor ID 10.4.255.29 recv_db_description: neighbor ID 10.4.255.29: seq num mismatch, bad flags > 14 feb. 2017 kl. 11:56 skrev Maxim Bourmistrov <m...@alumni.chalmers.se>: > > >> 14 feb. 2017 kl. 11:33 skrev Jeremie Courreges-Anglas <j...@wxcvbn.org <mailto:j...@wxcvbn.org>>: >> >> I have no idea why you're getting this kind of error, but maybe you >> can simplify your setup a bit more. Can you reproduce when using just >> em1 (out of the trunk) instead of trunk1? Just bnx1? > > Iâll try to modd this setup. > > Any how, I see almost exactly the same behavior with another setup > involving Cisco Nexus 3000-series. > Similarities in those two is - trunk used in both locations. > However reboot does not solve problem with Nexus. > > [fw1]-[11:33:02]# ospfctl sh nei > ID Pri State DeadTime Address Iface Uptime > 10.6.255.1 1 2-WAY/OTHER 00:00:38 10.6.255.1 trunk1 - > 10.6.255.28 1 2-WAY/OTHER 00:00:35 10.6.255.28 trunk1 - > 10.6.255.2 1 2-WAY/OTHER 00:00:38 10.6.255.2 trunk1 - > 10.6.255.30 1 EXSTA/DR 00:00:31 10.6.255.30 trunk1 - > 10.6.255.29 1 FULL/BCKUP 00:00:35 10.6.255.29 trunk1 01:24:41 > > > [fw2]-[11:45:00]# ospfctl sh nei > ID Pri State DeadTime Address Iface Uptime > 10.6.255.1 1 2-WAY/OTHER 00:00:37 10.6.255.1 trunk1 - > 10.6.255.27 1 2-WAY/OTHER 00:00:37 10.6.255.27 trunk1 - > 10.6.255.2 1 2-WAY/OTHER 00:00:37 10.6.255.2 trunk1 - > 10.6.255.30 1 EXCHG/DR 00:00:39 10.6.255.30 trunk1 - > 10.6.255.29 1 EXSTA/BCKUP 00:00:31 10.6.255.29 trunk1 - > > fw1/fw2 - openbsd 6.0-stable > fw1 local IP on trunk1 - 10.6.255.27 > fw2 local IP on trunk1 - 10.6.255.28 > > 10.6.255.{1,2} - openbsd 5.9-stable VMs > > 10.6.255.{29,30} - two Nexus with vPC (MLAG) > > fw1/fw2 connected to both switches and forming vPC (vPC on top of LAPC. LACP required for vPC to work). > Both are identical hardware wise as well as configuration wise. > > trunk1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 > lladdr a0:36:9f:37:d3:60 > description: VLAN990 > index 8 priority 0 llprio 3 > trunk: trunkproto lacp > trunk id: [(8000,a0:36:9f:37:d3:60,4045,0000,0000), > (7F9B,00:23:04:ee:be:01,802E,0000,0000)] > trunkport ix1 active,collecting,distributing > trunkport ix2 active,collecting,distributing > groups: trunk > media: Ethernet autoselect > status: active > inet 10.6.255.28 netmask 0xffffffe0 broadcast 10.6.255.31 > > > Sometimes fw1/fw2 get connected to both switches. Sometimes not. Sometimes to only one, sometimes to none. > > [fw2]-[11:52:00]# tcpdump -n -i trunk1 proto ospf > tcpdump: listening on trunk1, link-type EN10MB > 11:52:09.039855 10.6.255.1 > 224.0.0.5: OSPFv2-hello 64: rtrid 10.6.255.1 backbone dr 10.6.255.30 bdr 10.6.255.29 [tos 0xc0] [ttl 1] > 11:52:09.039901 10.6.255.28 > 224.0.0.5: OSPFv2-hello 64: rtrid 10.6.255.28 backbone dr 10.6.255.30 bdr 10.6.255.29 [tos 0xc0] [ttl 1] > 11:52:09.039981 10.6.255.27 > 224.0.0.5: OSPFv2-hello 64: rtrid 10.6.255.27 backbone dr 10.6.255.30 bdr 10.6.255.29 [tos 0xc0] [ttl 1] > 11:52:09.040108 10.6.255.2 > 224.0.0.5: OSPFv2-hello 64: rtrid 10.6.255.2 backbone dr 10.6.255.30 bdr 10.6.255.29 [tos 0xc0] [ttl 1] > 11:52:09.463707 10.6.255.30 > 10.6.255.28: OSPFv2-dd 32: rtrid 10.6.255.30 backbone E I/M/MS mtu 1500 S 3522476A [tos 0xc0] [ttl 1] > 11:52:09.463798 10.6.255.28 > 10.6.255.30: OSPFv2-dd 32: rtrid 10.6.255.28 backbone E I/M/MS mtu 1500 S 352265AF [tos 0xc0] [ttl 1] > 11:52:09.955800 10.6.255.29 > 10.6.255.28: OSPFv2-dd 32: rtrid 10.6.255.29 backbone E I/M/MS mtu 1500 S 84F508D [tos 0xc0] [ttl 1] > 11:52:09.955838 10.6.255.28 > 10.6.255.29: OSPFv2-dd 32: rtrid 10.6.255.28 backbone E I/M/MS mtu 1500 S 84F932C [tos 0xc0] [ttl 1] > 11:52:10.832978 10.6.255.2 > 224.0.0.6: OSPFv2-ls_upd 64: rtrid 10.6.255.2 backbone [tos 0xc0] [ttl 1] > 11:52:11.278971 10.6.255.1 > 224.0.0.6: OSPFv2-ls_upd 72: rtrid 10.6.255.1 backbone [tos 0xc0] [ttl 1] > 11:52:11.560311 10.6.255.30 > 224.0.0.5: OSPFv2-hello 64: rtrid 10.6.255.30 backbone dr 10.6.255.30 bdr 10.6.255.29 [tos 0xc0] [ttl 1] > 11:52:11.596931 10.6.255.30 > 224.0.0.5: OSPFv2-ls_ack 64: rtrid 10.6.255.30 backbone [tos 0xc0] [ttl 1] > 11:52:14.475690 10.6.255.28 > 10.6.255.30: OSPFv2-dd 32: rtrid 10.6.255.28 backbone E I/M/MS mtu 1500 S 352265AF [tos 0xc0] [ttl 1] > 11:52:14.730459 10.6.255.30 > 10.6.255.28: OSPFv2-dd 32: rtrid 10.6.255.30 backbone E I/M/MS mtu 1500 S 3522476A [tos 0xc0] [ttl 1] > 11:52:14.730613 10.6.255.28 > 10.6.255.30: OSPFv2-dd 132: rtrid 10.6.255.28 backbone E M mtu 1500 S 3522476A [tos 0xc0] [ttl 1] > 11:52:14.965713 10.6.255.28 > 10.6.255.29: OSPFv2-dd 32: rtrid 10.6.255.28 backbone E I/M/MS mtu 1500 S 84F932C [tos 0xc0] [ttl 1] > 11:52:15.019118 10.6.255.29 > 10.6.255.28: OSPFv2-dd 32: rtrid 10.6.255.29 backbone E I/M/MS mtu 1500 S 84F508D [tos 0xc0] [ttl 1] > 11:52:15.019252 10.6.255.28 > 10.6.255.29: OSPFv2-dd 132: rtrid 10.6.255.28 backbone E M mtu 1500 S 84F508D [tos 0xc0] [ttl 1] > 11:52:16.287215 10.6.255.1 > 224.0.0.6: OSPFv2-ls_upd 72: rtrid 10.6.255.1 backbone [tos 0xc0] [ttl 1] > ^C > 68 packets received by filter > 0 packets dropped by kernel > > [fw2]-[11:53:09]# tail -10 /var/log/daemon > Feb 14 11:52:20 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID 10.6.255.30: seq num mismatch, bad flags > Feb 14 11:52:20 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID 10.6.255.29: seq num mismatch, bad flags > Feb 14 11:52:31 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID 10.6.255.30: seq num mismatch, bad flags > Feb 14 11:52:31 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID 10.6.255.29: seq num mismatch, bad flags > Feb 14 11:52:42 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID 10.6.255.30: seq num mismatch, bad flags > Feb 14 11:52:43 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID 10.6.255.29: seq num mismatch, bad flags > Feb 14 11:52:53 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID 10.6.255.29: seq num mismatch, bad flags > Feb 14 11:52:54 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID 10.6.255.30: seq num mismatch, bad flags > > > Any clues?