Ah, Sorry - For some reason client decided to respond directly to Christos. Also correction on the CPU in the letter I sent to you.
The current machine has 2 physiscal CPU's 4 cores pr. CPU (E5420 @2.5Ghz) - The former had one, dualcore though, but only one CPU. Difference between the two is that on the new one Named ends in a parked state faster than the old one. So far race.c hasn't stopped working, but what I do note is that it seems that the majority of the time only one thread is being updated, then after a while another takes over - but still only one thread being updated. In the meantime, named fires up - runs for anything in between 3-10 minutes and then just ends in Parked. Thank you. Best regards S. P. Skou On Mon, Mar 9, 2015 at 8:20 PM, Søren P. Skou <[email protected]> wrote: > The current machine has 2 CPU's 6 cores pr. CPU - the former had one, > dualcore though, but only one CPU. Difference between the two is that the > new one ends in a parked state faster than the old one. > > I'll give this a whirl and get back to you on the result :) > > Thank you. > > Best regards > S. P. Skou > > On Mon, Mar 9, 2015 at 5:40 PM, Christos Zoulas <[email protected]> > wrote: > >> In article <CABYHU95LFHrVNqszfXzdh3p1R= >> [email protected]>, >> Søren P. Skou <[email protected]> wrote: >> >-=-=-=-=-=- >> > >> >Hi there, >> > >> >I'm currently investigating one of my Nameservers behvaviour. Recently I >> >switched from a pure virtual setup, to having 3 physical machines, each >> >connected to their own router. Each machine is installed with NetBSD >> 6.1.5 >> >(Generic) amd64, bind-9.10.1pl1 and exabgp-3.3.2nb1. >> > >> >The setup is such that bind is listening on aliases bound to lo0 and >> >127.0.0.1. Exabgp announces the 3 IP adresses with different local_pref >> for >> >each for the machines, to ensure that all servers are accessible at any >> >given time should one of the physical servers fail in any way, or if bind >> >gives up on resolving things. ExaBGP takes good care of this and that >> >particular part is running quite well. >> > >> >Now, for 1 of the 3 physical servers, Bind ends up in "parked" state >> after >> >a while. There seems to be little to no explanation why. At first I >> thought >> >this was due to a hardware error, so I replaced the hardware. The new >> >hardware did exactly the same. >> > >> >>From what I can read about the "parked" state is that it is currently >> >waiting for some resource, and will not move on, all signals apart from >> >"SIGKILL" will be queued and this is also the behaviour I see, >> >/etc/rc.d/named9 restart takes forever as it is waiting for bind's pid to >> >end. After a kill, it starts up nicely again - runs for a while, then >> dies. >> > >> >Currently I've put in a rather ugly hack combining sudo, kill from >> exabgp. >> >Restarting of named from crontab every 15 minutes, and that "works"(ish). >> > >> >But I would rather not have this problem with a parked bind. This was >> never >> >a problem on the virtual setup, here was other issues though, but not a >> >complete halt of service. This is a rather busy nameserver. I cannot get >> it >> >to fail under no load, here it just keeps on running. >> > >> >My question is, have anyone experienced this? Or alternatively, anyone >> who >> >has an idea as to where to look for what resource it is waiting for? >> > >> >> How many CPU's does the machine have? If you run: >> >> http://www.netbsd.org/~christos/race.c >> >> does it get stuck after a while? >> >> christos >> >> >
