On 11/09/14 21:58, Petr Spacek wrote:
On 11.9.2014 18:34, Martin Basti wrote:
On 11/09/14 15:57, Martin Basti wrote:
On 11/09/14 11:59, Petr Spacek wrote:
Hello,

I was fighting with random crashes for couple of days ... and discovered that run_exclusive_enter()/isc_task_beginexclusive() usage was completely
incorrect and didn't actually lock anything.

This series of patches reworks internal locking (and related event system)
to work around limitations of isc_task_beginexclusive() mechanism.

It would be better to get rid of isc_task_beginexclusive() completely but IMHO it is not possible because of BIND's dns_view*() functions have to be
guarded with it.


Testing is going to be interesting because we are speaking about race
conditions.

I used ~ 100 DNS zones, each zone had ~ 100 random domain names inside with
random A/AAAA/TXT RRs. My LDIF is here:
http://people.redhat.com/~pspacek/a/2014/09/11/dns-test.ldif.xz

I was able to randomly reproduce various crashes when BIND was running with
more threads than usually.

You can try to run BIND with this command (as root) and play games with -n
parameter:
$ export KRB5_KTNAME="/etc/named.keytab"
$ named -4 -g -u named -m record -n 10

Please test also the case where BIND receives SIGINT during start-up. It is
possible to run BIND with commands above and wait for message:
11-Sep-2014 11:54:58.092 running

At this point send SIGINT (CTRL+C) to BIND and see what happens. It could
crash or deadlock.

It is necessary to send the signal before BIND prints this message:
11-Sep-2014 11:55:11.707 zone z1.test/IN: loaded serial 1410429304

Let me know if you need any assistance.

I need your assistance, I haven't been able to reproduce it.

Martin

I applied the patchset, and NACK

I don't understand how I could possibly miss this. I was convinced that the patch set was thoroughly tested ...

Anyway, attached patch should fix the problem you were facing. Please re-test it.

Thank you!

Petr^2 Spacek

#1
If named is running and I randomly choose few zones and delete them, it causes
named failure

dig @localhost A r1.z12.test

; <<>> DiG 9.9.4-P2-RedHat-9.9.4-12.P2.fc20 <<>> @localhost A r1.z12.test
; (2 servers found)
;; global options: +cmd
;; connection timed out; no servers could be reached

* SIGINT doesn't work
* rndc doesn't work
* DS worksSIGINT signal stops working.

Output:
<snip>
11-Sep-2014 11:26:37.495 client 127.0.0.1#62615: received notify for zone
'z99.test'

^C^C^C^C^C^C^C^C


Process:
named 29125 1.1 2.9 789972 45976 pts/0 Sl+ 11:26 0:02 named -4 -g
-u named -m record -n 10

I have to kill it with kill -9

#2
same as #1 If new zone is added,

#3
same as #1 If new record is added

#4
same as #1 If record is deleted
Functional ACK

--
Martin Basti

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to