I apologize for the persistence, but maybe there will be some recommendations for debugging?

13.12.2021 7:18, Nikita Druba пишет:
Hi!

My system - OS FreeBSD 12.2 and filesystem - zfs. Samba 4.13.14 runs in a jail with Bind 9.16.23 like backend. Also I have Bind 9.16.23 on another server, its working like secondary dns. Secondary Bind gets zones from DC by transferring with a tsig-key. Also, I have several subnetworks(loopback and 3 other), whom DC listen.

Some time ago I moved DC from one jail to another. And I have strange behaviour of Bind at new DC.

When I set in resolv.conf of new DC other dns server, for example - old DC or secondary Bind, all works fine. New DC successfully resolve any records by nslookup or host commands from himself or other host.

When I set in resolv.conf of new DC localhost or himself internal ip, Bind periodically freezing by the next regularity:

- Bind stops to reply for the requests for a ~5 minutes. After start working without service restart and freeze again.

- At the daytime(when employees in a office), in freezes after less 1 minute work, at the night - after 10-15 minutes.

- If I change resolv.conf from secondary Bind to internal IP, then not need to restart Bind or Samba to start or stop periodically freezing. Just change nameserver record and wait. If it was freezed, when resolv.conf changing, then it will be in freeze state ~5 minutes after start freezing and after will work fine.

- If I change resolv.conf from secondary Bind to loopback, then NEED to restart Bind to start or stop freezing.

- When Bind freeze - it don't stopped service by a command and don't killed by default, only kill -9 work.

- Internal Samba DNS work fine and don't freeze, when resolv.conf look to localhost.

- Sometime Bind freeze not for all subnetworks. It can freeze for localhost and 2 subnetworks. In one last subnetwork DC Bind can successfully resolve any records from any subnetworks. But this situation I saw only one time and can't repeat it for now.

- No special Bind log records with "debug 50", in time or before of freezing. Its freezing after any messages. And all this messages I see in log, when Bind works without freezing.

- I tried to run bind with logging to terminal, but don't saw no additional information, when freeze. Terminal logs the same, like in log files.

- rndc freeze also.

I found one way for resolving this problem. My server, where work jail with DC, have 40 CPUs(20 cores and 40 threads). Therefore, when I starts named, it is creates 40 workers for every listen ip, i.e. 40 tcp and 40 udp for every ip.

Because its too much for my configuration, I intuitively made a decision to try to decrease number of named workers to 10 by "-n 10". And all works without freezing with correct resolv.conf during last 2 weeks.

After, I tried set "-n 40", the same like named defines this value automatically. After restart named freezed again. May be it was coincidence, but with other settings named do not stop freezing. Also I noticed, that when named works without freezing, "number of zones" in "rndc status" output decreasing from 9 to 3. Seems, that named missed samba zones, but resolving of records from them works fine.

I tried to collect some logs by ktrace and catched freeze moment. After last record from usual log(when Bind freezing), in kdump starts many times repeating the next records:

 36460 named    CALL  nanosleep(0x7fffffffea30,0)
 36460 named    RET   nanosleep 0

What can be wrong here? How I can more localize the problem?


_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to