I wrote on April 16: >I did some timings with BIND 9.6.1-P3 and 9.7.0-P1 on two servers: > > SunOS 5.9 sun4u sparc SUNW,Sun-Blade-1500 (old hardware) > Ubuntu hardy x86_64 GNU/Linux (more modern hardware) > >I had noticed long times for "rndc reload" to complete, and I wanted to >see if 9.6.1-P3 was different than 9.7.0-P1. > >My test DNS has three small "regular" zones defined. It also has our >list of 75,507 spyware/malware zones that we define to point to a >"honeypot" machine to log and respond to the traffic. > >I have marked with "<=====" areas where I have questions. > >SunOS 5.9: > --------------- > 9.7.0-P1 > Apr 15 10:27:48 ./rndc reload > Apr 15 10:28:21 reloading configuration succeeded > Apr 15 10:28:21 zone binhafeez.ae/IN: loaded serial ... > Apr 15 10:30:38 zone thabengmanagement.co.za/IN: loaded serial ... > named process taking 99.44% of CPU. <===== > Apr 15 10:32:51 reloading zones succeeded > Apr 15 10:32:51 next command prompt > --------------- > Apr 15 10:46:07 ./rndc stop > Apr 15 10:46:07 shutting down: flushing changes > Apr 15 10:46:07 no longer listening on 146.137.180.21 > Apr 15 named process not shown in "top" output. > Apr 15 10:47:41 ps -ef | grep named shows named process gone. <===== > --------------- > Apr 15 10:53:09 Start BIND. > Apr 15 10:53:09 built with '--prefix=/export/home/named/bind' > '--with-openssl=/krb5' > '--sysconfdir=/export/home/named' > '--enable-threads' > '--localstatedir=/var' > Apr 15 10:53:56 named taking 87% of CPU > Apr 15 10:54:49 ps -ef | grep named shows two named processes. > root 21638 21637 94 10:53:09 ? 1:34 /e... > root 21637 19863 0 10:53:09 pts/2 0:00 /e... > Apr 15 10:53:40 zone binhafeez.ae/IN: loaded serial ... > Apr 15 10:55:55 zone thabengmanagement.co.za/IN: loaded serial ... > Apr 15 10:58:08 two named processes <===== > Apr 15 10:58:09 next command prompt > Apr 15 10:58:13 one named process > --------------- > Apr 15 11:29:07 ./rndc stop > Apr 15 11:29:07 next command prompt > named taking 90% - 95% of CPU <===== > Apr 15 11:30:45 named process finally stops > --------------- > 9.6.1-P3 > Apr 15 11:32:34 Start BIND. > Apr 15 11:32:34 built with '--prefix=/export/home/named/bind' > '--with-openssl=/krb5' > '--sysconfdir=/export/home/named' > '--enable-threads' > '--localstatedir=/var' > two named processes running > named taking 80% - 99% of CPU > Apr 15 11:33:05 zone binhafeez.ae/IN: loaded serial ... > Apr 15 11:35:17 zone thabengmanagement.co.za/IN: loaded serial ... > Apr 15 11:37:31 running <===== > Apr 15 11:37:31 next comand prompt > Apr 15 11:37:32 one named process running > --------------- > > >Ubuntu hardy: > --------------- > 9.7.0-P1 > Apr 15 10:38:54 ./rndc reload > named process taking 11% - 20% of CPU > Apr 15 10:39:04 zone binhafeez.ae/IN: loaded serial ... > Apr 15 10:40:06 zone thabengmanagement.co.za/IN: loaded serial ... > Apr 15 10:40:19 next command prompt > ---------------- > Apr 15 11:07:51 ./rndc stop > Apr 15 11:07:51 next command prompt > Apr 15 11:07:52 no longer listening on 146.139.115. 146#53 > Apr 15 named taking 87% of CPU. > Apr 15 11:10:00 exiting > Apr 15 11:10:06 named process is gone. > --------------- > Apr 15 11:12:27 Start BIND. > Apr 15 11:12:27 built with '--prefix=/etc/iscbind/bind/' > '--sysconfdir=/etc/iscbind' > '--mandir=/usr/share/man' > '--infodir=/usr/share/info' > named taking 80% of CPU > two named processes running > root 31162 1 60 11:12 ? 00:00:38 /et > Apr 15 11:12:37 zone binhafeez.ae/IN: loaded serial ... > Apr 15 11:13:12 zone thabengmanagement.co.za/IN: loaded serial ... > Apr 15 11:13:14 next command prompt > Apr 15 11:13:15 one named process running > --------------- > Apr 15 11:18:25 ./rndc stop > --------------- > 9.6.1-P3 > Apr 15 11:22:57 Start BIND. > Apr 15 11:22:58 built with '--prefix=/etc/iscbind/bind/' > '--sysconfdir=/etc/iscbind' > '--localstatedir=/var' > '--mandir=/usr/share/man' > '--infodir=/usr/share/info' > Apr 15 11:23:07 zone binhafeez.ae/IN: loaded serial ... > Apr 15 11:23:27 zone thabengmanagement.co.za/IN: loaded serial ... > Apr 15 11:23:31 two named processes running > named taking 99% of CPU > Apr 15 11:23:31 next command prompt > Apr 15 11:23:36 one named process running > --------------- > >The Ubuntu hardy (running on newer hardware) is faster. My questions: > >1) On SunOS 5.9 after the last malware zone is loaded, there is a > wait of 2 minutes and 13 seconds before control is returned to > the command prompt, and DNS begins servicing queries. During this > time BIND is using close to 100% of the CPU. > On Ubuntu hardy, the wait is much shorter - up to 13 seconds. > What is BIND doing after the last zone is loaded? Why does it take > much longer on Solaris 9? > >2) When BIND starts (on both operating systems), "ps -ef" shows two > processes running: > > ps -ef | grep named > > root 31162 1 60 11:12 ? 00:00:38 /etc... > root 31162 31161 60 11:12 ? 00:00:20 /etc... > > When control is returned to the command prompt, the "ps -ef" > command shows only one process running. What is this second > process that is running during startup? > >3) With "rndc stop" it takes a second before control is returned > to the command prompt, but it takes 1.5 minutes before BIND > terminates. What is happening during this time? Is BIND writing > all of the zones to disk from memory? > >Thanks.
I have done more research. I ran a truss trace, and the trace ends when BIND issues the "running" message, and only one named process is running. The truss command must be tracing the process that ends when the "running" message is issued. I ran a truss trace with the BIND "-f" parameter (Run the server in the foreground (that is, do not daemonize)). The truss trace was run on Solaris 10_x86. Other timings were done on Solaris 9 Sun-Blade-1500 and Ubuntu hardy x86_64). Here are delta timings on the two Solaris boxes and an Ubuntu/hardy x86_64 box. There are about 75,000 spyware/malware zones being loaded. Sol9 x86 Ubu -------------------------------|----|----|----| first zone loaded |0:00|0:00|0:00| last zone loaded |2:14|1:21|0:22| next command prompt/"running" |2:13|1:32|0:03| BIND now using 0% CPU | ??|0:43|0:05| It takes longer on Solaris to load the 75,000 zones, but that may be due to faster hardware on the Ubuntu/hardy machine. On Ubuntu, after the last zone has been loaded, it takes about three seconds until control is returned to the command prompt and BIND begins responding to DNS requests. It takes much longer (with close to 100% CPU usage) on both Solaris boxes. What is named doing during this time? In the truss trace I see after the last zone has been loaded MANY lines like these: lwp_unpark(4) = 0 lwp_park(0xFE94FE60, 0) = 0 lwp_park(0x00000000, 0) = 0 lwp_park(0xFE94FE60, 0) = 0 lwp_unpark(4) = 0 yield() = 0 There are fewer "yield()" lines than "lwp_" lines. After the "running" message is issued, I see LOTS of lines like these: lwp_unpark(2) = 0 lwp_park(0x00000000, 0) = 0 lwp_unpark(3) = 0 lwp_park(0x00000000, 0) = 0 yield() = 0 (again with fewer "yield()" lines) until the BIND/named process is using 0% of the CPU. Then the truss trace has two lines without timestamps: lwp_park(0x00000000, 0) (sleeping...) lwp_park(0x00000000, 0) (sleeping...) and then there are no more trace entries until I shutdown BIND. I found one Sun blog site http://blogs.sun.com/chrisg/entry/lwp_park_and_lwp_unpark that explains the lwp_ sytem calls, but I do not know the BIND source code well enough to determine what code is causing these lwp_ and yield calls, which are using close to 100% of the CPU. ---------------------------------------------------------------------- Barry S. Finkel Computing and Information Systems Division Argonne National Laboratory Phone: +1 (630) 252-7277 9700 South Cass Avenue Facsimile:+1 (630) 252-4601 Building 240, Room 5.B.8 Internet: bsfin...@anl.gov Argonne, IL 60439-4828 IBMMAIL: I1004994 _______________________________________________ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users