Thanks you so much, Mark.
Based on your input, I successfully found the culprit.... It's one of the LDNS.
It's supposed to config the zone as
"xx.node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org". But somehow it's been configed
as "node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org", which is not delegated to this
LDNS.
What's more, only the newly added root DNS will reply with the "real" incorrect
zone "node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org". The old one's reply is
"correct", it's "xx.node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org" (Maybe the old
one doesn't query the LDNS and reply the query with it's own configuration).
When the servers query the newly added dns and cache the incorrect zone
"node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org", the invalid log keeps popping up.
In summary, the issue happens on two conditions:
1. the incorrect configuration in LDNS.
2. the newly added root DNS, whose mechanism is different from the old one.
------------------ Original ------------------
From: "marka";<ma...@isc.org>;
Send time: Thursday, Jan 14, 2021 11:12 AM
To: "同屋"<39223...@qq.com>;
Cc: "Bind-users"<Bind-users@lists.isc.org>;
Subject: Re: "not subdomain of zone {XXXX} -- invalid response" errors
found in named.run log
> On 7 Jan 2021, at 00:57, 同屋 <39223...@qq.com> wrote:
>
> Actually, the background is a little bit complicated. In short, the topo
is as belows. dns1 were swapped by a new one (say dns1*), then the issue
happened. After that, we dropped all the AAAA request from dns1*, then the
issue was gone.
Well if you stop making requests that result in negative responses (NXDOMAIN or
NOERROR/NODATA) you no longer send responses with the incorrect SOA record in
the authority section.
> There is no config change during the whole process, no idea why the
caching server has such log.
You get such logs because there are servers that are misconfigured. If
you delegate a zone to a server then ALL negative responses for queries in that
delegated namespace should be coming back with a SOA record that matches the
delegated zone. Named checks the returned SOA record in the authority
section and if it isn’t a expected value then named logs the messages you are
seeing.
You can reproduce this with the following setup where example.com is delegated
to server1.example.com and child.example.com is delegated to
server2.example.com but it is incorrectly configured for a different version of
example.com.
server1.example.com(192.0.2.1):
example.com. SOA server1.example.com. . 0 0 0 0 0
example.com. NS server1.example.com.
server1.example.com. A 192.0.2.1
server2.example.com. A 192.0.2.2
child.example.com. NS server2.example.com.
server2.example.com(192.0.2.2):
example.com. SOA server2.example.com. . 0 0 0 0 0
example.com. NS server2.example.com.
server2.example.com. A 192.0.2.2
child.example.com. A 192.0.2.3
A proper delegation would have:
server2.example.com(192.0.2.2):
child.example.com. SOA server2.example.com. . 0 0 0 0 0
child.example.com. NS server2.example.com.
child.example.com. A 192.0.2.3
Load balancers often end up with broken configuration because, it appears, the
documentation is not clear enough. The load balancing software knows
about A queries and returns for them but punts all the other queries to a
backing server which instead of being configured with the zone
child.example.com is configured with the zone example.com which contains just
the SOA and NS records.
example.com. SOA server1.example.com. . 0 0 0 0 0
example.com. NS server1.example.com.
Client -> load balancer -> backing server.
If you ask for child.example.com/A you get back a A record with the computed
value.
If you ask for child.example.com/AAAA the load balancer says this not something
I deal with and passes the request on to the backing nameserver which, because
it has been configured to serve example.com instead of child.example.com,
returns a negative response with example.com as the owner name of the SOA
record rather than a child.example.com SOA record that is expected.
Mark
> -------- ---------
> |dns1 | | dns2 |
> -------- ---------
>
|
|
> --------------
>
|
> -----------------
> |caching server| (where the log was observed)
> ------------------
>
> ------------------ Original ------------------
> From: "同屋";<39223...@qq.com>;
> Send time: Wednesday, Jan 6, 2021 8:43 PM
> To: "同屋"<39223...@qq.com>; "marka"<ma...@isc.org>;
> Cc: "Bind-users"<Bind-users@lists.isc.org>;
> Subject: re:Re: "not subdomain of zone {XXXX} -- invalid response"
errors found in named.run log
>
> Thanks mark, but why this issue is related to load balancer?
>
>
>
> ------------------ Original Message ------------------
> From: "Mark Andrews";
> Date: 2021-01-06 19:09
> To: "同屋"<39223...@qq.com>;
> To:
> "bind-users";
>
> Subject: Re: "not subdomain of zone {XXXX} -- invalid response" errors
found in named.run log
>
>
> Complain to the administrators of the zone. They have not properly
delegated it. We see this often with load balancers.
>
> The zone a.b.example has been delegated but the answer is as if it is from
b.example.
>
> --
> Mark Andrews
>
>> On 6 Jan 2021, at 21:02, 同屋 <39223...@qq.com> wrote:
>>
>>
>> The version of bind is BIND 9.10.5-P3 id:7d5676f
>>
>> One day, I found that the size of named.run is increasing very
quickly. And a lot of "invalid response" entries were spotted in the log.
Details is as follows (I replace the sensitive info with {xxxx},{AAA}
etc.)
>>
>> DNS format error from {IP}#53 resolving
{XXXX}.bf.bf.node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org/AAAA for client
169.254.4.50#51099: Name epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org (SOA) not
subdomain of zone node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org -- invalid response
>>
>> The response related to the above log is as follows:
>>
>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50664 ;;
flags: qr aa rd ra; QUESTION: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT
PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 4096 ;; QUESTION SECTION:
;{XXXX}.bf.bf.node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org. IN AAAA
>>
>> ;; AUTHORITY SECTION: ;epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org. 86400 IN
SOA .mnc{AAA}.mcc{BBB}.gprs. dns-admin. ( ;
2020122704 ; serial ; 10800 ; refresh (3 hours) ; 3600 ; retry (1
hour) ; 604800 ; expire (1 week) ; 86400 ; minimum (1 day) ; )
>>
>> ============================================
>>
>> Normally, the FQDN should be cached as a NXRRSET record as follows:
>>
>> {XXXX}.bf.bf.node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org. 8412 -AAAA
;-$NXRRSET
>>
>> But when the issue happens, it cannot be cached, I guess it's related
to the "invalid response" log.
>>
>> From the error log, it mentions "zone
node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org", but I'm wondering where the zone
"node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org" comes from? I cannot found the
related SOA record in the dump file.
>>
>> _______________________________________________
>> Please visit https://lists.isc.org/mailman/listinfo/bind-users to
unsubscribe from this list
>>
>> ISC funds the development of this software with paid support
subscriptions. Contact us at https://www.isc.org/contact/ for more information.
>>
>>
>> bind-users mailing list
>> bind-users@lists.isc.org
>> https://lists.isc.org/mailman/listinfo/bind-users
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871
4742
INTERNET: ma...@isc.org
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe
from this list
ISC funds the development of this software with paid support subscriptions.
Contact us at https://www.isc.org/contact/ for more information.
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users