In article <CABYHU95LFHrVNqszfXzdh3p1R=b01xfyi-y_ocmhcluxba+...@mail.gmail.com>, Søren P. Skou <[email protected]> wrote: >-=-=-=-=-=- > >Hi there, > >I'm currently investigating one of my Nameservers behvaviour. Recently I >switched from a pure virtual setup, to having 3 physical machines, each >connected to their own router. Each machine is installed with NetBSD 6.1.5 >(Generic) amd64, bind-9.10.1pl1 and exabgp-3.3.2nb1. > >The setup is such that bind is listening on aliases bound to lo0 and >127.0.0.1. Exabgp announces the 3 IP adresses with different local_pref for >each for the machines, to ensure that all servers are accessible at any >given time should one of the physical servers fail in any way, or if bind >gives up on resolving things. ExaBGP takes good care of this and that >particular part is running quite well. > >Now, for 1 of the 3 physical servers, Bind ends up in "parked" state after >a while. There seems to be little to no explanation why. At first I thought >this was due to a hardware error, so I replaced the hardware. The new >hardware did exactly the same. > >>From what I can read about the "parked" state is that it is currently >waiting for some resource, and will not move on, all signals apart from >"SIGKILL" will be queued and this is also the behaviour I see, >/etc/rc.d/named9 restart takes forever as it is waiting for bind's pid to >end. After a kill, it starts up nicely again - runs for a while, then dies. > >Currently I've put in a rather ugly hack combining sudo, kill from exabgp. >Restarting of named from crontab every 15 minutes, and that "works"(ish). > >But I would rather not have this problem with a parked bind. This was never >a problem on the virtual setup, here was other issues though, but not a >complete halt of service. This is a rather busy nameserver. I cannot get it >to fail under no load, here it just keeps on running. > >My question is, have anyone experienced this? Or alternatively, anyone who >has an idea as to where to look for what resource it is waiting for? >
How many CPU's does the machine have? If you run: http://www.netbsd.org/~christos/race.c does it get stuck after a while? christos
