Mark Andrews wrote:
In message <4a452428.9020...@provident-solutions.com>, "Vernon A. Fort" writes:
I've run into a problem with named and timeouts primarily with MX
lookups. When a MX query fails the first time, i have to restart the
named process before it will return a successful query. Again, its
mainly with MX lookups but it also happens with A records as well. The
problem subsides for 1-2 hours and starts happening again - basically i
look in the mailq for deferred messages with MX lookup failures.
This box is a Gentoo install running a medium volume (500K per day) mail
server - lots of dns queries due to rbl's, spamassassin, etc. This
problem started showing up around mid-may. Since then, i have
re-installed bind and bind-tools several times, updated the kernel,
linux headers to 2.6.29, recompiled glibc, etc....
I just updated to 9.6.0-P1 from 9.4.3-P2 - same problem exists. When
doing a manual MX lookup (dig MX isc.org) - it takes around 45 seconds
on the first attempt. If it fails the first time, it will never return
a positive query, just "connection timed out; no servers could be
reached" until i restart named. I can't say for sure but the bind
application was updated around the time i noticed this problem. All
versions of bind i have tried (in gentoo portage) have the same problem.
Can anyone help me find where this problem might be? I've google'd
until my eyes are red and throbbing.
Thanks
Vernon
_______________________________________________
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
I suggest that you fix your firewalls to allow 4096 byte EDNS
responses though. Both ORG and ISC.ORG are signed zones so there
reponses are larger than with unsigned zones. Named is having to
retry with different options to get a response through your firewall
and this takes time.
A EDNS/UDP MX response is 1999 bytes for isc.org.
;; Query time: 872 msec
;; SERVER: 2001:4f8:0:2::19#53(2001:4f8:0:2::19)
;; WHEN: Sat Jun 27 09:39:34 2009
;; MSG SIZE rcvd: 1999
I now have two servers running behind checkpoint firewall which are
failing to resolve MX records. One of IT guys called CheckPoint and
support suggested they disable the smart defense DNS udp check. This
did correct the problem, but queries are still sluggish from time to time.
I have three questions related to this:
1. On both servers - the dns version (and glibc) were updated in
mid-January bind-9.4.1 to 9.4.3. The SmartDefense DNS check has been
enabled on both firewalls long before the last updates were applied.
Why did the issues just now start showing up (late May - early June)?
2. When a email is deferred in the mailq, it will stay deferred until
named is restarted. I just tested this on a mail message that sat in
the queue for just about three days. I keep trying to dig MX domain.com
during this time period and NOTHING would resolved (including any A
records) until i restarted named. Why?
3. In both network environments, i switched the resolution to internal
windows 2003 dns servers. NO problems occurred during the week we used
the windows DNS server. Why would smartdefense not have the same effect
on windows based name servers?
Updated to bind-9.6.1 and updating the root.zone file made little if any
difference. Basically, It appears that SOMETHING has changed somewhere
because we have just now altered the cisco PIX rules to increase the udp
packet size due to timeout in these environments. I have seen posts
related to my problems as far back as 2-3 years ago. So again, i'm
scratching my head wondering what the heck did i miss - why did these
problems just now start showing up?
Any pointers or additional reading would be greatly appreciated. I'm
just trying to understand from a 1000 foot view but whatever view anyone
suggests is fine.
Vernon
_______________________________________________
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users