Hello.
I've got the following core for bird 1.4.5:
(gdb) bt
#0 0x000000000043ecfa in ospf_dbdes_send (n=0x8011018a0, next=1) at
../../../proto/ospf/dbdes.c:145
#1 0x000000000043f6b2 in ospf_dbdes_receive (ps_i=0x8010c9000,
ifa=0x8011131a0, n=0x8011018a0) at ../../../proto/ospf/dbdes.c:386
#2 0x0000000000438fdd in ospf_rx_hook (sk=0x80101b8c0, size=28) at
../../../proto/ospf/packet.c:485
#3 0x000000000045f972 in sk_read (s=0x80101b8c0) at io.c:1760
#4 0x000000000046034b in io_loop () at io.c:1975
#5 0x0000000000467da3 in main (argc=3, argv=0x7fffffffed30) at main.c:825
..
(gdb) p n->dbsi
$20 = {prev = 0x0, null = 0x0, next = 0x0, node = 0x0}
(gdb) p sn
$22 = (snode *) 0x0
Investigations has shown, that there was major OSPF instability in that area
(~20 quagga boxes and and Juniper device) at that moment with either
re-election or DR/BDR hang.
Unfortunately, I don't have much logs for that. We also had an (typical) issue
with this particular quagga peer just prior to the crash:
Feb 19 18:28:34 XXX ospf6d[8387]: SLOW THREAD: task ospf6_receive
(7f793a115810) ran for 5044ms (cpu time 5032ms)
My guess is that
1) we started to send our DB to the peer and it stopped confirming DD packets
for a while
2) Flap happened so part of/most LSADB got flushed
3) Quagga finally awoke from sleep and confirmed last packet
4) we tried to get the next chunk of LSAs but there were no more (unsent ) LSAs
in DB
5) this message appeared in the list
Something similar to the attached patch should fix this particular issue (at
least I hope so).--- proto/ospf/dbdes.c 2015-02-20 14:59:33.000000000 +0300
+++ proto/ospf/dbdes.c.new 2015-02-20 14:59:28.000000000 +0300
@@ -112,7 +112,7 @@
if (next)
{
- snode *sn;
+ snode *sn = NULL;
struct ospf_lsa_header *lsa;
if (n->ldd_bsize != ifa->tx_length)
@@ -133,10 +133,8 @@
j = i = (ospf_pkt_maxsize(ifa) - sizeof(struct ospf_dbdes_packet)) / sizeof(struct ospf_lsa_header); /* Number of possible lsaheaders to send */
lsa = (n->ldd_buffer + sizeof(struct ospf_dbdes_packet));
- if (n->myimms.bit.m)
+ if (n->myimms.bit.m && (sn = s_get(&(n->dbsi))))
{
- sn = s_get(&(n->dbsi));
-
DBG("Number of LSA: %d\n", j);
for (; i > 0; i--)
{
@@ -170,6 +168,12 @@
s_put(&(n->dbsi), sn);
}
+ else if (n->myimms.bit.m)
+ {
+ DBG("Iterator position changed to the last item\n");
+ DBG("M bit unset.\n");
+ n->myimms.bit.m = 0; /* Unset more bit */
+ }
pkt->imms.byte = n->myimms.byte;