I think I may have found a clue in the informational RFC for OOB Resync: When a DBD packet is received with the R-bit set and the sender is known to be OOB-incapable, the packet should be dropped and a SeqNumber-Mismatch event should be generated for the neighbor.
My router must have received a DBD from the firewall with the R-bit set, which means the neighbor is participating in OOB resync; however, if the router did not previously recognize the firewall as being capable of OOB Resync, it will drop the packet and log a sequence number mismatch. That may explain part of what we were seeing. Several questions now remain: 1. What triggered the OOB resync in the first place? 2. If the firewall isn't capable of doing OOB resync, why would it send DBD packets with the R-bit set? (Perhaps it is capable and just wasn't previously setting the LR-bit in hello messages) John On Fri, Feb 8, 2013 at 9:28 PM, John Neiberger <[email protected]> wrote: > This is a new one on me. We had a situation where OSPF between a router > and a firewall seemed to go insane and it involves something I've never > heard of before: Out of band Resync. Here are the logs from the beginning > of the event: > > Feb 8 23:32:45.777 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from FULL to EXSTART, OOB-Resynchronization > Feb 8 23:32:50.777 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from EXSTART to EXCHANGE, Negotiation Done > Feb 8 23:34:49.830 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from EXCHANGE to DOWN, Neighbor Down: Too many retransmissions > Feb 8 23:35:49.830 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from DOWN to DOWN, Neighbor Down: Ignore timer expired > Feb 8 23:35:50.790 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from DOWN to INIT, Received Hello > Feb 8 23:35:50.790 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from INIT to 2WAY, 2-Way Received > Feb 8 23:35:50.790 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from 2WAY to EXSTART, AdjOK? > Feb 8 23:35:50.810 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from EXSTART to EXSTART, SeqNumberMismatch > Feb 8 23:36:00.814 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from EXSTART to EXSTART, SeqNumberMismatch > Feb 8 23:36:10.814 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from EXSTART to EXSTART, SeqNumberMismatch > Feb 8 23:36:25.814 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from EXSTART to EXSTART, SeqNumberMismatch > Feb 8 23:36:30.818 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 > from EXSTART to EXSTART, SeqNumberMismatch > > Something happens to trigger an out-of-band resync and then the neighbor > gets stuck in EXSTART because of a sequence number mismatch. I first > thought we had an MTU mismatch, but the MTUs seem to check out. I read > somewhere that sequence number mismatches can be caused by a software > error. This just isn't something I've run into before. > > First, I don't know what OOB Resynchronization is or what all it entails, > so I'm going to read some more about that to find out what triggers it and > what it is supposed to be doing under the hood. Second, why would a peer > that had been working just fine suddenly divebomb into the ground and then > get stuck in exstart? > > We ultimately resolved the problem by clearing the OSPF process a couple > of times. Eventually all seemed to clear up and things are working fine. I > suspect a buggy OSPF implementation on the firewall but that's really just > a guess. The router is running 12.2(33)SRE3 code, which I think has a > pretty mature OSPF code. > > Any thoughts? > > Thanks, > John > _______________________________________________ cisco-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
