i'm experiencing dns-related oddness with analog 6, built from source on CentOS 4.4.

specifically, i noticed that (a) analog is taking a *really* long time to run (compared with how fast it used to run on RH9), and (b) i'm getting an increasing number of unresolved DNS entries in my reports. config hasn't changed between the RH9 box (pre-august) and the CentOS box (post-august).

so i turned on DEBUG +D and tail'ed the dns.cache file.

what i noticed in some small test runs was that every time i started it, i'd get 5-20 successful lookups, perhaps with a few failed lookups scattered in; but within a few seconds (a minute, tops), i'd get a string of failures (many of which did, in fact, resolve with dig -x), and i would never get a successful lookup after that. i have a GREAT number of * entries at the bottom of my DNS cache (which i have now moved to one side).

so i ran the binary under strace, and i append a hundred or so lines (with comments interspersed, marked by ***** in the margin and surrounded by whitespace) below. it looks to me like at some point analog just stops connecting to the local DNS server. moreover, once i entered this consistent-failure mode, no DNS queries (port 53 packets) left my server.

following a comment in a list thread from 2003 (http://article.gmane.org/gmane.comp.web.analog.general/1592), i tried compiling with #define NOALARM, and that seems to have fixed the problem - but i now sometimes have to wait arbitrarily long until DNS times out. whether this will speed up my analog run remains to be seen; initial results suggest it will (dns cache file getting longer fairly quickly, many lookups successful).

can anyone either:
a) tell me this is a known problem with analog, and how to fix it?
b) tell me this is a known problem with bind, and how to fix it?
c) laugh at me, and point out the holes in my tests / reasoning?
d) say "that's odd...", or offer any similar supportive comment?

thanks for any light anyone can shed.



***** here's a normal lookup

write(2, "D: Looking up 80.1.31.115:\n", 27D: Looking up 80.1.31.115:
) = 27
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGALRM, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 0x7ee898}, 
{SIG_DFL}, 8) = 0
alarm(3)                                = 0
open("/etc/hosts", O_RDONLY)            = 6
fcntl64(6, F_GETFD)                     = 0
fcntl64(6, F_SETFD, FD_CLOEXEC)         = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=276, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7f61000
read(6, "# Do not remove the following li"..., 4096) = 276
read(6, "", 4096)                       = 0
close(6)                                = 0
munmap(0xb7f61000, 4096)                = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 6

***** now connect to local bind and query it

connect(6, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("127.0.0.1")}, 28) = 0
fcntl64(6, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(6, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
gettimeofday({1159003479, 832386}, NULL) = 0
poll([{fd=6, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
send(6, "\226#\1\0\0\1\0\0\0\0\0\0\003115\00231\0011\00280\7in-"..., 42, 
MSG_NOSIGNAL) = 42
poll([{fd=6, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
ioctl(6, FIONREAD, [158])               = 0
recvfrom(6, "\226#\201\200\0\1\0\1\0\2\0\2\003115\00231\0011\00280\7"..., 1024, 0, 
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 158
close(6)                                = 0
alarm(0)                                = 3
rt_sigaction(SIGALRM, {SIG_DFL}, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 
0x7ee898}, 8) = 0

***** get a good answer and write to cache

write(2, "D:   resolved to pam80-1-31-115."..., 50D:   resolved to 
pam80-1-31-115.man.dial.ntli.net
) = 50
write(4, "19316724 80.1.31.115 pam80-1-31-"..., 54) = 54

***** here's another normal-looking lookup

write(2, "D: Looking up 211.99.203.196:\n", 30D: Looking up 211.99.203.196:
) = 30
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGALRM, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 0x7ee898}, 
{SIG_DFL}, 8) = 0
alarm(3)                                = 0
open("/etc/hosts", O_RDONLY)            = 6
fcntl64(6, F_GETFD)                     = 0
fcntl64(6, F_SETFD, FD_CLOEXEC)         = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=276, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7f61000
read(6, "# Do not remove the following li"..., 4096) = 276
read(6, "", 4096)                       = 0
close(6)                                = 0
munmap(0xb7f61000, 4096)                = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 6

***** now connect to local bind and query it

connect(6, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("127.0.0.1")}, 28) = 0
fcntl64(6, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(6, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
gettimeofday({1159003479, 860494}, NULL) = 0
poll([{fd=6, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
send(6, "\36*\1\0\0\1\0\0\0\0\0\0\003196\003203\00299\003211\7i"..., 45, 
MSG_NOSIGNAL) = 45
poll([{fd=6, events=POLLIN}], 1, 5000)  = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGALRM, {SIG_DFL}, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 
0x7ee898}, 8) = 0

***** got no answer in time, log that to cache

write(2, "D:   can\'t resolve\n", 19D:   can't resolve
)   = 19
write(4, "19316724 211.99.203.196 *\n", 26) = 26
read(5, "\n64.68.82.69 - - [10/Jan/2003:12"..., 28672) = 28672
read(5, "le; MSIE 5.5; Windows 98)\"\n198.1"..., 4096) = 4096

***** here's the first odd lookup

write(2, "D: Looking up 213.123.11.38:\n", 29D: Looking up 213.123.11.38:
) = 29
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGALRM, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 0x7ee898}, 
{SIG_DFL}, 8) = 0
alarm(3)                                = 0
futex(0x8f1190, FUTEX_WAIT, 2, NULL)    = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGALRM, {SIG_DFL}, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 
0x7ee898}, 8) = 0

***** it hasn't tried to connect to local bind, but it's logging a failure

write(2, "D:   can\'t resolve\n", 19D:   can't resolve
)   = 19
write(4, "19316724 213.123.11.38 *\n", 25) = 25

***** this one's the same, never connects to bind before logging failure

write(2, "D: Looking up 66.230.140.66:\n", 29D: Looking up 66.230.140.66:
) = 29
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGALRM, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 0x7ee898}, 
{SIG_DFL}, 8) = 0
alarm(3)                                = 0
futex(0x8f1190, FUTEX_WAIT, 2, NULL)    = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGALRM, {SIG_DFL}, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 
0x7ee898}, 8) = 0
write(2, "D:   can\'t resolve\n", 19D:   can't resolve
)   = 19
write(4, "19316724 66.230.140.66 *\n", 25) = 25

***** this one is also the same

write(2, "D: Looking up 211.157.36.4:\n", 28D: Looking up 211.157.36.4:
) = 28
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGALRM, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 0x7ee898}, 
{SIG_DFL}, 8) = 0
alarm(3)                                = 0
futex(0x8f1190, FUTEX_WAIT, 2, NULL)    = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGALRM, {SIG_DFL}, {0x8077534, [ALRM], SA_RESTORER|SA_RESTART, 
0x7ee898}, 8) = 0
write(2, "D:   can\'t resolve\n", 19D:   can't resolve
)   = 19
write(4, "19316724 211.157.36.4 *\n", 24) = 24

***** they just carry on like this





--

  Tom Yates
  Cambridge, UK.
+------------------------------------------------------------------------
|  TO UNSUBSCRIBE from this list:
|    http://lists.meer.net/mailman/listinfo/analog-help
|
|  Analog Documentation: http://analog.cx/docs/Readme.html
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------

Reply via email to