Hello.

I've got the following core for bird 1.4.5:

(gdb) bt
#0  0x000000000043ecfa in ospf_dbdes_send (n=0x8011018a0, next=1) at 
../../../proto/ospf/dbdes.c:145
#1  0x000000000043f6b2 in ospf_dbdes_receive (ps_i=0x8010c9000, 
ifa=0x8011131a0, n=0x8011018a0) at ../../../proto/ospf/dbdes.c:386
#2  0x0000000000438fdd in ospf_rx_hook (sk=0x80101b8c0, size=28) at 
../../../proto/ospf/packet.c:485
#3  0x000000000045f972 in sk_read (s=0x80101b8c0) at io.c:1760
#4  0x000000000046034b in io_loop () at io.c:1975
#5  0x0000000000467da3 in main (argc=3, argv=0x7fffffffed30) at main.c:825
..
(gdb) p n->dbsi
$20 = {prev = 0x0, null = 0x0, next = 0x0, node = 0x0}
(gdb) p sn
$22 = (snode *) 0x0

Investigations has shown, that there was major OSPF instability in that area 
(~20 quagga boxes and and Juniper device) at that moment with either 
re-election or DR/BDR hang.
Unfortunately, I don't have much logs for that. We also had an (typical) issue 
with this particular quagga peer just prior to the crash:
Feb 19 18:28:34 XXX ospf6d[8387]: SLOW THREAD: task ospf6_receive 
(7f793a115810) ran for 5044ms (cpu time 5032ms)

My guess is that
1) we started to send our DB to the peer and it stopped confirming DD packets 
for a while
2) Flap happened so part of/most LSADB got flushed
3) Quagga finally awoke from sleep and confirmed last packet
4) we tried to get the next chunk of LSAs but there were no more (unsent ) LSAs 
in DB
5) this message appeared in the list

Something similar to the attached patch should fix this particular issue (at 
least I hope so).
--- proto/ospf/dbdes.c	2015-02-20 14:59:33.000000000 +0300
+++ proto/ospf/dbdes.c.new	2015-02-20 14:59:28.000000000 +0300
@@ -112,7 +112,7 @@
 
     if (next)
     {
-      snode *sn;
+      snode *sn = NULL;
       struct ospf_lsa_header *lsa;
 
       if (n->ldd_bsize != ifa->tx_length)
@@ -133,10 +133,8 @@
       j = i = (ospf_pkt_maxsize(ifa) - sizeof(struct ospf_dbdes_packet)) / sizeof(struct ospf_lsa_header);	/* Number of possible lsaheaders to send */
       lsa = (n->ldd_buffer + sizeof(struct ospf_dbdes_packet));
 
-      if (n->myimms.bit.m)
+      if (n->myimms.bit.m && (sn = s_get(&(n->dbsi))))
       {
-	sn = s_get(&(n->dbsi));
-
 	DBG("Number of LSA: %d\n", j);
 	for (; i > 0; i--)
 	{
@@ -170,6 +168,12 @@
 
 	s_put(&(n->dbsi), sn);
       }
+      else if (n->myimms.bit.m)
+      {
+	DBG("Iterator position changed to the last item\n");
+	DBG("M bit unset.\n");
+	n->myimms.bit.m = 0;	/* Unset more bit */
+      }
 
       pkt->imms.byte = n->myimms.byte;
 

Reply via email to