Apparently this is a known design issue with bind-dyndb-ldap, the glue
between bind/named and LDAP.

https://bugzilla.redhat.com/show_bug.cgi?id=1071356 mentions this behaviour
on startup, and the response was:

> This is "expected" behavior for bind-dyndb-ldap version 4.0 and higher:
> See https://git.fedorahosted.org/cgit/bind-dyndb-ldap.git/tree/NEWS for 
> version 4.0 point [5].

> It simply takes some time to load all the data from LDAP to named.

> If you want to see some other behavior please open a bug against 
> bind-dyndb-ldap component.


I don't see any bugs against bind-dyndb-ldap for this behaviour of
responding NXDOMAIN during startup while data is loading (instead of
e.g. SERVFAIL, or not responding at all if it doesn't know the right
response).

https://pagure.io/bind-dyndb-ldap/issue/124 mentions this behaviour
too, and indicates that it could be solved with caching, but the
ticket hasn't moved for some time. There are no workarounds listed.


On Fri, Oct 27, 2017 at 1:04 PM Nicholas Hinds <hin...@gmail.com> wrote:

> This might not be entirely related to a FreeIPA upgrade. I have managed to
> reproduce this by sending lots of queries at bind/named while it's
> restarting (sudo service named-pkcs11 restart). Sometimes these queries
> during startup will get unlucky and return NXDOMAIN with invalid authority
> information, like I observed during the FreeIPA upgrade.
>
> It's possible the FreeIPA upgrade just loaded my system up so that bind
> took longer to finish starting up - my test system is running on some
> pretty low-specced hardware.
>
> Simpler steps to reproduce this:
>
> $ sudo service named-pkcs11 restart; for i in {1..20}; do dig
> a.cname.in.my.freeipa @localhost|grep status; done
> Redirecting to /bin/systemctl restart named-pkcs11.service
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 11664
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 15073
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 31456
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 36166
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 36299
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 53211
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 30928
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10465
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 65318
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 33517
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 35773
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2719
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42969
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28725
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16096
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55018
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54067
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47360
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6057
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20778
>
>
> I turned the debug level of named up, and it seems to be answering queries
> before it has read all of the DNS entries from LDAP.
>
> The queries which were returning NXDOMAIN all occurred before a log entry
> "general: debug 7: add a.cname.in.my.freeipa. 60 IN CNAME
> destination.in.my.freeipa.", and the queries which returned NOERROR all
> occurred after that log entry. There are also log entries between the
> NXDOMAIN and NOERROR messages where it loads the parent zones of the entry
> ("general: debug 1: zone my.freeipa/IN: starting load" / "general: debug 1:
> zone in.my.freeipa/IN: starting load"), so the NXDOMAIN response might be
> because it hasn't read in the NS records or does not yet understand that it
> is supposed to be the authoritative nameserver for that zone.
>
> Is there a way to make bind/named only respond to queries once it's read
> its configuration fully from LDAP? Or just to wait e.g. 30 seconds after
> the bind/named service starts before listening on port 53, to lower the
> chances of responding to queries while it's still booting?
>
> On Thu, Oct 26, 2017 at 11:43 AM Nicholas Hinds <hin...@gmail.com> wrote:
>
>> On Thu, Oct 26, 2017 at 9:17 AM Rob Crittenden <rcrit...@redhat.com>
>> wrote:
>>
>>> Nicholas Hinds wrote:
>>> > I tried running `sudo service named-pkcs11 stop` before the yum update,
>>> > but FreeIPA still returned NXDOMAIN responses temporarily.
>>>
>>> You want the service named.
>>>
>> That service does not exist in my FreeIPA installation:
>> $ sudo service named status
>> Redirecting to /bin/systemctl status named.service
>> ● named.service
>>    Loaded: masked (/dev/null; bad)
>>    Active: inactive (dead)
>>
>> Running `sudo service named stop` gives no output, and running `sudo
>> ipactl status` afterwards shows that "named" is still running:
>> $ sudo service named stop
>> Redirecting to /bin/systemctl stop named.service
>> $ sudo ipactl status
>> Directory Service: RUNNING
>> krb5kdc Service: RUNNING
>> kadmin Service: RUNNING
>> named Service: RUNNING
>> httpd Service: RUNNING
>> ipa-custodia Service: RUNNING
>> ntpd Service: RUNNING
>> pki-tomcatd Service: RUNNING
>> smb Service: RUNNING
>> winbind Service: RUNNING
>> ipa-otpd Service: RUNNING
>> ipa-dnskeysyncd Service: RUNNING
>> ipa: INFO: The ipactl command was successful
>>
>>
>> If I stop named-pkcs11, `sudo ipactl status` shows that "named" is
>> stopped:
>> $ sudo service named-pkcs11 stop
>> Redirecting to /bin/systemctl stop named-pkcs11.service
>> $ sudo ipactl status
>> Directory Service: RUNNING
>> krb5kdc Service: RUNNING
>> kadmin Service: RUNNING
>> named Service: STOPPED
>> httpd Service: RUNNING
>> ipa-custodia Service: RUNNING
>> ntpd Service: RUNNING
>> pki-tomcatd Service: RUNNING
>> smb Service: RUNNING
>> winbind Service: RUNNING
>> ipa-otpd Service: RUNNING
>> ipa-dnskeysyncd Service: RUNNING
>> ipa: INFO: The ipactl command was successful
>>
>>
>> So at least on my machine, stopping the OS service "named-pkcs11" stops
>> "named" for FreeIPA.
>>
>>
>>> > It seems like these responses occur about 10 seconds after the last log
>>> > entry in /var/log/ipaupgrade.log ("The ipa-server-upgrade command was
>>> > successful"). Based on the IPA "posttrans" script from the RPM, it
>>> seems
>>> > likely the NXDOMAIN responses are being returned while the
>>> > `/bin/systemctl restart ipa.service` command is running, however I
>>> > cannot reproduce the NXDOMAIN responses by running `/bin/systemctl
>>> > restart ipa.service` on its own. Something in the yum upgrade or
>>> > ipa-server-upgrade process seems to trigger this different behaviour.
>>>
>>> As I said, by default right now bind remains running while its backend,
>>> 389-ds, is unavailable during the package update process. The ipa
>>> service doesn't reproduce this because of the order in which the
>>> services are restarted.
>>>
>>
>> If I stop the "ipa" service then start only bind ("named-pkcs11"), so the
>> backend isn't running, DNS queries return the "SERVFAIL" status rather than
>> "NXDOMAIN", which makes sense to me. They also do not return any authority
>> information. It does not appear that bind returns "NXDOMAIN" with incorrect
>> authority information if its backend is completely unavailable when it
>> starts.
>>
>> If I start the "ipa" service and attempt to stop all of its components
>> apart from bind/named one by one (ipa-dnskeysyncd, winbind, smb, ntpd,
>> ipa-custodia, httpd, kadmin, krb5kdc, pki-tomcatd@pki-tomcat,
>> dirsrv@MY-DOMAIN), the DNS server continues to correctly respond to DNS
>> queries. This could be because I have a pair of replicated FreeIPA
>> instances, and once bind/named starts it knows how to query from the
>> secondary server? Although stopping FreeIPA on my second server does not
>> stop DNS queries from being answered - perhaps bind has just cached the
>> response for the test query I am using. Either way, stopping all of these
>> services including dirsrv (which I believe is the 389-ds backend process)
>> does not result in "NXDOMAIN" responses with incorrect authority
>> information.
>>
>>
>>> rob
>>>
>>> >
>>> > On Tue, Oct 24, 2017 at 1:45 PM Rob Crittenden <rcrit...@redhat.com
>>> > <mailto:rcrit...@redhat.com>> wrote:
>>> >
>>> >     Nicholas Hinds via FreeIPA-users wrote:
>>> >     > During an upgrade from 4.5.0-21.el7.centos.1.2
>>> >     > to 4.5.0-21.el7.centos.2.2 on a CentOS 7.4 machine, FreeIPA's DNS
>>> >     server
>>> >     > briefly returned NXDOMAIN for records which existed in FreeIPA.
>>> These
>>> >     > invalid responses were returned for a very short amount of time,
>>> but
>>> >     > caused long-running issues with Java clients which tend to cache
>>> DNS
>>> >     > responses. Upgraded packages included: 389-ds-base,
>>> 389-ds-base-libs,
>>> >     > 389-ds-base-snmp, ipa-client, ipa-client-common,
>>> ipa-python-compat,
>>> >     > ipa-server, ipa-server-common, ipa-server-dns,
>>> ipa-server-trust-ad,
>>> >     > python2-ipa-server, and a dozen sss-related packages.
>>> >     >
>>> >     > I reproduced this in a FreeIPA test environment by running `while
>>> >     true;
>>> >     > do dig some.dns.entry.managed.by
>>> >     <http://some.dns.entry.managed.by>.freeipa @ip.address.of.freeipa
>>> |
>>> >     tee -a
>>> >     > a-log-file; done` from one server, and running `yum update` on
>>> the
>>> >     > FreeIPA machine. The invalid NXDOMAIN responses were returned
>>> some
>>> >     time
>>> >     > after the `yum update` logged 'Cleanup' for the RPMs, and seemed
>>> to be
>>> >     > during the 'Verifying' phase.
>>> >     >
>>> >     > These NXDOMAIN responses claimed that an upstream nameserver
>>> >     > (a.root-servers.net <http://a.root-servers.net>
>>> >     <http://a.root-servers.net>) was the authority for
>>> >     > my zone:
>>> >     >
>>> >     > a-log-file-; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.7 <<>>
>>> >     > some.dns.entry.managed.by
>>> >     <http://some.dns.entry.managed.by>.freeipa @172.16.0.77
>>> >     <http://172.16.0.77> <http://172.16.0.77>
>>> >     > a-log-file-;; global options: +cmd
>>> >     > a-log-file-;; Got answer:
>>> >     > a-log-file:;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id:
>>> 2889
>>> >     > a-log-file-;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1,
>>> >     > ADDITIONAL: 0
>>> >     > a-log-file-
>>> >     > a-log-file-;; QUESTION SECTION:
>>> >     > a-log-file-;some.dns.entry.managed.by.freeipa. IN A
>>> >     > a-log-file-
>>> >     > a-log-file-;; AUTHORITY SECTION:
>>> >     > a-log-file-.60INSOAa.root-servers.net
>>> >     <http://60INSOAa.root-servers.net> <http://a.root-servers.net>.
>>> >     > nstld.verisign-grs.com <http://nstld.verisign-grs.com>
>>> >     <http://nstld.verisign-grs.com>. 2017102400 1800
>>> >     > 900 604800 86400
>>> >     > a-log-file-
>>> >     > a-log-file-;; Query time: 227 msec
>>> >     > a-log-file-;; SERVER: 172.16.0.77#53(172.16.0.77)
>>> >     > a-log-file-;; WHEN: Tue Oct 24 18:30:28 2017
>>> >     > a-log-file-;; MSG SIZE  rcvd: 130
>>> >     >
>>> >     > Usually when querying an invalid DNS entry, the dig output still
>>> >     claims
>>> >     > that my FreeIPA server is authoritative for the zone:
>>> >     > $ dig doesntexist.zone.managed.by
>>> >     <http://doesntexist.zone.managed.by>.freeipa @172.16.0.77
>>> >     <http://172.16.0.77> <http://172.16.0.77>
>>> >     >
>>> >     > ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.7 <<>>
>>> >     > doesntexist.zone.managed.by
>>> >     <http://doesntexist.zone.managed.by>.freeipa @172.16.0.77
>>> >     <http://172.16.0.77> <http://172.16.0.77>
>>> >     > ;; global options: +cmd
>>> >     > ;; Got answer:
>>> >     > ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 59953
>>> >     > ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1,
>>> >     ADDITIONAL: 0
>>> >     >
>>> >     > ;; QUESTION SECTION:
>>> >     > ;doesntexist.zone.managed.by
>>> >     <http://doesntexist.zone.managed.by>.freeipa. IN A
>>> >     >
>>> >     > ;; AUTHORITY SECTION:
>>> >     > zone.managed.by.freeipa.30 INSOAidm01.freeipa.
>>> >     > hostmaster.zone.managed.by
>>> >     <http://hostmaster.zone.managed.by>.freeipa. 1508869828 30 900
>>> >     1209600 30
>>> >     >
>>> >     > ;; Query time: 0 msec
>>> >     > ;; SERVER: 172.16.0.77#53(172.16.0.77)
>>> >     > ;; WHEN: Tue Oct 24 19:27:12 2017
>>> >     > ;; MSG SIZE  rcvd: 113
>>> >     >
>>> >     >
>>> >     > Is it possible that during a yum update, the FreeIPA DNS server
>>> >     > temporarily forgets what zones it's authoritative for (or
>>> forgets all
>>> >     > DNS records) and just delegates to the upstream DNS server for
>>> half a
>>> >     > second or so? Or is something else going on here?
>>> >     >
>>> >     > I'm open to suggestions.
>>> >
>>> >     The LDAP server is brought down during upgrades which is likely the
>>> >     issue. bind can't connect to its backend. Why it returns NXDOMAIN I
>>> >     don't know.
>>> >
>>> >     You may be able to manually work around this by manually stopping
>>> bind
>>> >     before updating IPA, then starting it again afterwards.
>>> >
>>> >     rob
>>> >
>>>
>>>
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org

Reply via email to