Re: Strange problem with a query deleting a record...

Gordon A. Lang Sat, 24 Aug 2013 08:54:28 -0700

Making some assumptions about where your dig queries are being sent, I wouldsay it looks like the Squid is simply failing its DNS lookup (for whateverreason), then the Squid system is retaining a 5 minute negative cache. Ifthis is true, then the question would become why does the Squid system failon that one lookups but (presumably) succeeds on others?


--
Gordon A. Lang


--------------------------------------------------
From: "John E.P. Hynes" <[email protected]>
Sent: Saturday, August 24, 2013 8:55 AM
To: "Barry Margolin" <[email protected]>
Cc: <[email protected]>; <[email protected]>
Subject: Re: Strange problem with a query deleting a record...

On 08/24/2013 12:46 AM, Barry Margolin wrote:

In article <[email protected]>,
  Mark Andrews <[email protected]> wrote:

In message <[email protected]>, Kevin Darcy writes:

On 8/22/2013 12:55 PM, [email protected] wrote:

Greetings All,

First of all, I apologize if this is out of place - I'm having a very
strange issue that is either a problem with bind itself, or at least,
affecting it.  Summary:

For only ONE address, whenever I attempt to access it through my squid
proxy, the record disappears from DNS, and the retry time changes too.

Essentially, accessing www.thisdomain.com works, but a link to aportal

on

that page to the subdomain login.thisdomain.com causes the problem.I'm

willing to bet the problem lies with squid, but as to how it could

possibly change a record in bind... Well, I'm stumped. If you don'tgo

through squid, everything works.  All other requests to bind for the
address of the host in question work fine. Here's a the output of dig
from
before accessing the page through squid:

; <<>> DiG 9.4.1-P1 <<>> login.thisdomain.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45037
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0

;; QUESTION SECTION:
;login.thisdomain.com.            IN      A

;; ANSWER SECTION:
login.thisdomain.com.     17      IN      A       111.222.333.123

;; AUTHORITY SECTION:
thisdomain.com.         168319  IN      NS      ns1.thisdomain.com.
thisdomain.com.         168319  IN      NS      ns2.thisdomain.com.

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Aug 22 12:29:57 2013
;; MSG SIZE  rcvd: 88

You can do anything to request the address from bind and it works,
*except* try to access it through squid.  Bypassing squid and going
directly through the firewall works fine.

Now, immediately after you try to access it through squid:

; <<>> DiG 9.4.1-P1 <<>> login.thisdomain.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 43943
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;login.thisdomain.com.            IN      A

;; AUTHORITY SECTION:
thisdomain.com.         298     IN      SOA     ns1.thisdomain.com.
serv.anotherdomain.com. 2006062510 3600 3600 2592000 300

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Aug 22 12:30:06 2013
;; MSG SIZE  rcvd: 95

After the 5-minute retry shown above expires, the original record
reappears.

Ideas?  I'm stumped.  It seems like squid is somehow able to corrupt
bind's info, but I can't imagine how.

I have a theory. If this is a name that's hosted on a stupid
load-balancer, and that load-balancer doesn't understand non-A-record
query types, then if Squid is sending a non-A query type (e.g. SRV,

possibly even AAAA, if it's *really* stupid), then the load-balancermay

be erroneously "poisoning" your cache with an NXDOMAIN response.

We ran into this many years ago with Cisco GSSes (Global SiteSelectors)

and work around it by having a "shadow" version of the zone, which the
GSSes proxy to for QTYPEs they don't handle. That "shadow" version of
the zone has a wildcard entry in it which forces responses to be NODATA
instead of NXDOMAIN, and this prevents the cache poisoning.

                                                              - Kevin

The load balancer should be able to correct for such misconfigurations
by changing the rcode of the response from NXDOMAIN to NOERROR.  It
knows what names is is answering for so it can know that the NXDOMAIN
is a erroneous response.

If I understand what Kevin was saying, the load balancer IS the DNS
server. If you ask it for the A record it's responsible for, it sends a
reasonable reply. If you ask it for some other record type for that
name, it sends NXDOMAIN instead of NOERROR.

It's a design flaw in these load balancers.


Thanks everyone who's been helping with this.

In order to investigate this further, I did a tcpdump of both a "working"conversation of a browser requesting the site, not going through the squidproxy, and another of the "broken" conversation through the proxy.

Result: There is an NXDOMAIN response to a request for an AAAA recordthat the proxy makes that is causing this. The browser never asks foranything but an A record, which succeeds.

I've contacted the site in question with this info, so hopefully it'll getresolved. I'll keep the list posted on any results or info for posterity.


-John


--
Please consider the environment before printing this e-mail.

This e-mail is intended only for the named person or entity to which it
is addressed and contains valuable business information that is
privileged, confidential and/or otherwise protected from disclosure.
Dissemination, distribution or copying of this e-mail or the information
herein by anyone other than the intended recipient, or an employee, or
agent responsible for delivering the message to the intended recipient,
is strictly prohibited.  All contents are the copyright property of the
sender.  If you are not the intended recipient, you are nevertheless
bound to respect the sender's worldwide legal rights.  We require that
unintended recipients delete the e-mail and destroy all electronic
copies in their system, retaining no copies in any media.  If you have
received this e-mail in error, please immediately notify us by calling
our Help Desk at (603) 433-1143, or e-mail to [email protected].
We appreciate your cooperation.

_______________________________________________

Please visit https://lists.isc.org/mailman/listinfo/bind-users tounsubscribe from this list


bind-users mailing list
[email protected]
https://lists.isc.org/mailman/listinfo/bind-users

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
[email protected]
https://lists.isc.org/mailman/listinfo/bind-users

Re: Strange problem with a query deleting a record...

Reply via email to