Ah, Sorry - For some reason client decided to respond directly to Christos.
Also correction on the CPU in the letter I sent to you.

The current machine has 2 physiscal CPU's 4 cores pr. CPU (E5420 @2.5Ghz)
- The former had one, dualcore though, but only one CPU. Difference between
the two is that on the new one Named ends in a parked state faster than the
old one.

So far race.c hasn't stopped working, but what I do note is that it seems
that the majority of the time only one thread is being updated, then after
a while another takes over - but still only one thread being updated. In
the meantime, named fires up - runs for anything in between 3-10 minutes
and then just ends in Parked.


Thank you.

Best regards
S. P. Skou

On Mon, Mar 9, 2015 at 8:20 PM, Søren P. Skou <[email protected]> wrote:

> The current machine has 2 CPU's 6 cores pr. CPU - the former had one,
> dualcore though, but only one CPU. Difference between the two is that the
> new one ends in a parked state faster than the old one.
>
> I'll give this a whirl and get back to you on the result :)
>
> Thank you.
>
> Best regards
> S. P. Skou
>
> On Mon, Mar 9, 2015 at 5:40 PM, Christos Zoulas <[email protected]>
> wrote:
>
>> In article <CABYHU95LFHrVNqszfXzdh3p1R=
>> [email protected]>,
>> Søren P. Skou  <[email protected]> wrote:
>> >-=-=-=-=-=-
>> >
>> >Hi there,
>> >
>> >I'm currently investigating one of my Nameservers behvaviour. Recently I
>> >switched from a pure virtual setup, to having 3 physical machines, each
>> >connected to their own router. Each machine is installed with NetBSD
>> 6.1.5
>> >(Generic) amd64, bind-9.10.1pl1 and exabgp-3.3.2nb1.
>> >
>> >The setup is such that bind is listening on aliases bound to lo0 and
>> >127.0.0.1. Exabgp announces the 3 IP adresses with different local_pref
>> for
>> >each for the machines, to ensure that all servers are accessible at any
>> >given time should one of the physical servers fail in any way, or if bind
>> >gives up on resolving things. ExaBGP takes good care of this and that
>> >particular part is running quite well.
>> >
>> >Now, for 1 of the 3 physical servers, Bind ends up in "parked" state
>> after
>> >a while. There seems to be little to no explanation why. At first I
>> thought
>> >this was due to a hardware error, so I replaced the hardware. The new
>> >hardware did exactly the same.
>> >
>> >>From what I can read about the "parked" state is that it is currently
>> >waiting for some resource, and will not move on, all signals apart from
>> >"SIGKILL" will be queued and this is also the behaviour I see,
>> >/etc/rc.d/named9 restart takes forever as it is waiting for bind's pid to
>> >end. After a kill, it starts up nicely again - runs for a while, then
>> dies.
>> >
>> >Currently I've put in a rather ugly hack combining sudo, kill from
>> exabgp.
>> >Restarting of named from crontab every 15 minutes, and that "works"(ish).
>> >
>> >But I would rather not have this problem with a parked bind. This was
>> never
>> >a problem on the virtual setup, here was other issues though, but not a
>> >complete halt of service. This is a rather busy nameserver. I cannot get
>> it
>> >to fail under no load, here it just keeps on running.
>> >
>> >My question is, have anyone experienced this? Or alternatively, anyone
>> who
>> >has an idea as to where to look for what resource it is waiting for?
>> >
>>
>> How many CPU's does the machine have? If you run:
>>
>> http://www.netbsd.org/~christos/race.c
>>
>> does it get stuck after a while?
>>
>> christos
>>
>>
>

Reply via email to