Stuart Henderson <stu <at> spacehopper.org> writes: > cron job to restart it, with a random delay to avoid two machines > coming back up at the same time when all the routers at a site > fail together...
So you just check it every minute to see if it is alive? It seems to me to be a pretty fundamental design flaw in the software given its role. I would expect it to return sending a packet or something, not just exit. > > The first message below seems to indicate unable to allocate > > memory. I'm running these boxes pretty much stock having not tuned any > > parameters at all. Both are just running routing daemons (bgpd, ospf) > > and the 4.3 box is running OpenVPN. There are no applications running > > and both boxes have plenty of RAM (4GB) and not using any swap or > > anything. > > > > Is there something I should look at tuning in terms > > of memory allocation in order to stop this happening? > > Make sure login.conf memory limits for the daemon class (or the > _bgpd class on a newer OS version using /etc/rc.d) are high enough. > If your limits are insufficient for the size of routing table then > obviously you will have a problem. But also there is a bug > somewhere, possibly to do with nexthop changes, which can result > in very rapidly increasing memory use. Currently my routing table is pretty small. Only something like 150 routes. This will increase once we start taking full feeds. At the moment we only have a few partial feeds from networks we peer with and everything else goes out a default route. I don't think it is a memory issue with the process itself, but the error message seems to be more related to memory available to send the packet. This is why I'm wondering if there is some sysctl or similar somewhere I should be tweaking. -Matt

