i'm experiencing dns-related oddness with analog 6, built from source on
CentOS 4.4.
specifically, i noticed that (a) analog is taking a *really* long time to
run (compared with how fast it used to run on RH9), and (b) i'm getting an
increasing number of unresolved DNS entries in my reports. config hasn't
changed between the RH9 box (pre-august) and the CentOS box (post-august).
so i turned on DEBUG +D and tail'ed the dns.cache file.
what i noticed in some small test runs was that every time i started it,
i'd get 5-20 successful lookups, perhaps with a few failed lookups
scattered in; but within a few seconds (a minute, tops), i'd get a string
of failures (many of which did, in fact, resolve with dig -x), and i would
never get a successful lookup after that. i have a GREAT number of *
entries at the bottom of my DNS cache (which i have now moved to one
side).
so i ran the binary under strace, and i append a hundred or so lines (with
comments interspersed, marked by ***** in the margin and surrounded by
whitespace) below. it looks to me like at some point analog just stops
connecting to the local DNS server. moreover, once i entered this
consistent-failure mode, no DNS queries (port 53 packets) left my server.
following a comment in a list thread from 2003
(http://article.gmane.org/gmane.comp.web.analog.general/1592), i tried
compiling with #define NOALARM, and that seems to have fixed the problem -
but i now sometimes have to wait arbitrarily long until DNS times out.
whether this will speed up my analog run remains to be seen; initial
results suggest it will (dns cache file getting longer fairly quickly,
many lookups successful).
can anyone either:
a) tell me this is a known problem with analog, and how to fix it?
b) tell me this is a known problem with bind, and how to fix it?
c) laugh at me, and point out the holes in my tests / reasoning?
d) say "that's odd...", or offer any similar supportive comment?
thanks for any light anyone can shed.
***** here's a normal lookup
write(2, "D: Looking up 80.1.31.115:\n", 27D: Looking up 80.1.31.115:
) = 27
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigaction(SIGALRM, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 0x7ee898},
{SIG_DFL}, 8) = 0
alarm(3) = 0
open("/etc/hosts", O_RDONLY) = 6
fcntl64(6, F_GETFD) = 0
fcntl64(6, F_SETFD, FD_CLOEXEC) = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=276, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7f61000
read(6, "# Do not remove the following li"..., 4096) = 276
read(6, "", 4096) = 0
close(6) = 0
munmap(0xb7f61000, 4096) = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 6
***** now connect to local bind and query it
connect(6, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("127.0.0.1")}, 28) = 0
fcntl64(6, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1159003479, 832386}, NULL) = 0
poll([{fd=6, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
send(6, "\226#\1\0\0\1\0\0\0\0\0\0\003115\00231\0011\00280\7in-"..., 42,
MSG_NOSIGNAL) = 42
poll([{fd=6, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
ioctl(6, FIONREAD, [158]) = 0
recvfrom(6, "\226#\201\200\0\1\0\1\0\2\0\2\003115\00231\0011\00280\7"..., 1024, 0,
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 158
close(6) = 0
alarm(0) = 3
rt_sigaction(SIGALRM, {SIG_DFL}, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART,
0x7ee898}, 8) = 0
***** get a good answer and write to cache
write(2, "D: resolved to pam80-1-31-115."..., 50D: resolved to
pam80-1-31-115.man.dial.ntli.net
) = 50
write(4, "19316724 80.1.31.115 pam80-1-31-"..., 54) = 54
***** here's another normal-looking lookup
write(2, "D: Looking up 211.99.203.196:\n", 30D: Looking up 211.99.203.196:
) = 30
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigaction(SIGALRM, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 0x7ee898},
{SIG_DFL}, 8) = 0
alarm(3) = 0
open("/etc/hosts", O_RDONLY) = 6
fcntl64(6, F_GETFD) = 0
fcntl64(6, F_SETFD, FD_CLOEXEC) = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=276, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7f61000
read(6, "# Do not remove the following li"..., 4096) = 276
read(6, "", 4096) = 0
close(6) = 0
munmap(0xb7f61000, 4096) = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 6
***** now connect to local bind and query it
connect(6, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("127.0.0.1")}, 28) = 0
fcntl64(6, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1159003479, 860494}, NULL) = 0
poll([{fd=6, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
send(6, "\36*\1\0\0\1\0\0\0\0\0\0\003196\003203\00299\003211\7i"..., 45,
MSG_NOSIGNAL) = 45
poll([{fd=6, events=POLLIN}], 1, 5000) = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGALRM, {SIG_DFL}, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART,
0x7ee898}, 8) = 0
***** got no answer in time, log that to cache
write(2, "D: can\'t resolve\n", 19D: can't resolve
) = 19
write(4, "19316724 211.99.203.196 *\n", 26) = 26
read(5, "\n64.68.82.69 - - [10/Jan/2003:12"..., 28672) = 28672
read(5, "le; MSIE 5.5; Windows 98)\"\n198.1"..., 4096) = 4096
***** here's the first odd lookup
write(2, "D: Looking up 213.123.11.38:\n", 29D: Looking up 213.123.11.38:
) = 29
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigaction(SIGALRM, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 0x7ee898},
{SIG_DFL}, 8) = 0
alarm(3) = 0
futex(0x8f1190, FUTEX_WAIT, 2, NULL) = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGALRM, {SIG_DFL}, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART,
0x7ee898}, 8) = 0
***** it hasn't tried to connect to local bind, but it's logging a failure
write(2, "D: can\'t resolve\n", 19D: can't resolve
) = 19
write(4, "19316724 213.123.11.38 *\n", 25) = 25
***** this one's the same, never connects to bind before logging failure
write(2, "D: Looking up 66.230.140.66:\n", 29D: Looking up 66.230.140.66:
) = 29
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigaction(SIGALRM, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 0x7ee898},
{SIG_DFL}, 8) = 0
alarm(3) = 0
futex(0x8f1190, FUTEX_WAIT, 2, NULL) = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGALRM, {SIG_DFL}, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART,
0x7ee898}, 8) = 0
write(2, "D: can\'t resolve\n", 19D: can't resolve
) = 19
write(4, "19316724 66.230.140.66 *\n", 25) = 25
***** this one is also the same
write(2, "D: Looking up 211.157.36.4:\n", 28D: Looking up 211.157.36.4:
) = 28
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigaction(SIGALRM, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 0x7ee898},
{SIG_DFL}, 8) = 0
alarm(3) = 0
futex(0x8f1190, FUTEX_WAIT, 2, NULL) = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGALRM, {SIG_DFL}, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART,
0x7ee898}, 8) = 0
write(2, "D: can\'t resolve\n", 19D: can't resolve
) = 19
write(4, "19316724 211.157.36.4 *\n", 24) = 24
***** they just carry on like this
--
Tom Yates
Cambridge, UK.
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------