If built with symbols, after it has gotten into the CPU busy-loop, use gdb to attach to it, ala:
gdb <path to bird6> <process id of bird6> Ex: gdb /usr/local/sbin/bird6 `ps -C bird6 -o pid=` then "bt for a stack trace, possibly showing where stuck. "cont" to continue and then another control-c to check again. Do this a few times. Hopefully there will be a pattern. Copy & paste the results to this list. "quit" to exit gdb, allowing bird6 to continue. Chris On Sat, 31 Jan 2015, Baptiste Jonglez wrote: > I just tried downgrading from 1.4.5 to 1.4.4, using the 1.4.4-1~bpo70+1 > Debian package from http://bird.network.cz/?download&tdir=debian/ > > The result is the same, bird6 also freezes periodically with version 1.4.4. > > By the way, I think I ruled out the possibility that a particular BGP peer > is sending garbage: the issue still arises when leaving only one BGP > session active, whichever it is. > > Is there anything else I can do to help troubleshoot the root cause of > this issue? > > On Thu, Jan 29, 2015 at 08:03:07PM +0100, Baptiste Jonglez wrote: > > Hi, > > > > We are experiencing regular "freezes" of bird6 on a BGP router. When this > > happens, bird6 maxes out a CPU for several minutes. If a command is run > > in birdc6 during such a freeze, the command hangs, and the result is only > > returned when birdc6 has stopped using the CPU. Note that this also > > applies to "cheap" commands like "show protocols", which usually complete > > instantly (both with bird, and with bird6 in non-freeze conditions). > > > > Sometimes (but not always), the non-responsiveness of bird6 causes all BGP > > sessions to drop, which is really annoying on a full-view BGP router. > > > > The freezes happen at random, but seem to happen more frequently when the > > router is under load (typically, at peak time, each CPU spends ~20% > > forwarding packets, on a 4-core box). > > > > The BGP setup is made of multiple transit and peerings, on multiple VLANs > > (some BGP neighbours share the same VLAN). The setup is pretty similar on > > bird and bird6, but only bird6 exhibits these freezes, bird works just fine. > > > > The box is running Debian wheezy on amd64, with bird from backports: > > 1.4.5-1~bpo70+1 > > > > Attached is the configuration, and two extracts of the logs when all BGP > > sessions dropped (with debug { states, interfaces, events }). All files > > are anonymised, but should be consistent. > > > > What do you think? It looks like bird6 gets stuck on some very expensive > > operation, which prevents it from doing anything else (include maintaining > > BGP sessions alive). > > > > Thanks, > > Baptiste
