This is a new one on me. We had a situation where OSPF between a router and a firewall seemed to go insane and it involves something I've never heard of before: Out of band Resync. Here are the logs from the beginning of the event:
Feb 8 23:32:45.777 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from FULL to EXSTART, OOB-Resynchronization Feb 8 23:32:50.777 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from EXSTART to EXCHANGE, Negotiation Done Feb 8 23:34:49.830 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from EXCHANGE to DOWN, Neighbor Down: Too many retransmissions Feb 8 23:35:49.830 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from DOWN to DOWN, Neighbor Down: Ignore timer expired Feb 8 23:35:50.790 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from DOWN to INIT, Received Hello Feb 8 23:35:50.790 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from INIT to 2WAY, 2-Way Received Feb 8 23:35:50.790 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from 2WAY to EXSTART, AdjOK? Feb 8 23:35:50.810 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from EXSTART to EXSTART, SeqNumberMismatch Feb 8 23:36:00.814 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from EXSTART to EXSTART, SeqNumberMismatch Feb 8 23:36:10.814 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from EXSTART to EXSTART, SeqNumberMismatch Feb 8 23:36:25.814 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from EXSTART to EXSTART, SeqNumberMismatch Feb 8 23:36:30.818 UTC: %OSPF-5-ADJCHG: Process 100, Nbr 1.2.3.4 on Vlan7 from EXSTART to EXSTART, SeqNumberMismatch Something happens to trigger an out-of-band resync and then the neighbor gets stuck in EXSTART because of a sequence number mismatch. I first thought we had an MTU mismatch, but the MTUs seem to check out. I read somewhere that sequence number mismatches can be caused by a software error. This just isn't something I've run into before. First, I don't know what OOB Resynchronization is or what all it entails, so I'm going to read some more about that to find out what triggers it and what it is supposed to be doing under the hood. Second, why would a peer that had been working just fine suddenly divebomb into the ground and then get stuck in exstart? We ultimately resolved the problem by clearing the OSPF process a couple of times. Eventually all seemed to clear up and things are working fine. I suspect a buggy OSPF implementation on the firewall but that's really just a guess. The router is running 12.2(33)SRE3 code, which I think has a pretty mature OSPF code. Any thoughts? Thanks, John _______________________________________________ cisco-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
