I am in a desperate need of help with dnsdist/pdns/pds recursor combo debugging.
Versions: dnsdist 1.2.0 pdns authoritative 0.0.2117gcf04fed (a couple of days old) pdns recursor 0.0.2000gcf04fed (a couple of days old) gcc 4.8.5 CentOS 7.4 PostgreSQL backend No DNSSEC Logic: 2 DNS servers with dnsdist/pdns/pdns_recursor services both servers have their own IPv4/IPv6 and loopback IPv4/IPv6 addresses (all public, no NAT) loopback addresses are statically routed in the gateway via server adresses there is no firewall/iptables between the servers, and servers are in a single broadcast domain, VMware distributed switch Server hostnames and DNS names are not the same dnsdist is listening on IPv4/IPv6 loopback addresses authoritative and recursive backends are listening on server IPv4 addresses internal clients are sent to the recursor backend recursor backend has a forward-zones-file defined other clients are sent to the auth backend Problem: While 99% of the queries succeed, some of the queries randomly return SERVFAIL responses. For example, when checking any DNS name using cachecheck.opendns.com for the first time, 20-50% of the servers return SERVFAIL. For the second time, even refreshing the cache, all the responses are successful. What i tried to do: Ive set up a test domain with MX records and tested mail flow, emails frequently bounced back because MX record was not found or MX server DNS name could not be resolved. I could not catch any of those errors using loglevel 9, log-dns-queries=yes and log-dns-details=yes Using tcpdump port 53 i only see something along those lines 10:47:30.717092 IP DNS1_NAME.domain > QUERIED_DNS_NAME.sscan: 8272 ServFail 0/0/0 (49) I sometimes see timeouts in pdns log Feb 26 13:04:18 SERVER_HOSTNAME pdns_server[3574]: TCP Connection Thread died because of network error: Timeout reading data Feb 26 13:04:18 SERVER_HOSTNAME pdns[3574]: TCP Connection Thread died because of network error: Timeout reading data dnsdist: setLocal('127.0.0.1:53') addLocal(ROUTED_LOOPBACK_IPV4_ADDRESS) addLocal(ROUTED_LOOPBACK_IPV6_ADDRESS) setACL({'0.0.0.0/0', '::/0'}) recursive_ips = newNMG() recursive_ips:addMask(RECURSIVE_CLIENTS_NETWORK) newServer({address='DNS1_PUBLIC_IPV4_ADDRESS:5300', pool='auth'}) newServer({address='DNS2_PUBLIC_IPV4_ADDRESS:5300', pool='auth'}) newServer({address='DNS1_PUBLIC_IPV4_ADDRESS:5300', pool='recursor'}) newServer({address='DNS2_PUBLIC_IPV4_ADDRESS:5301', pool='recursor'}) addAction(NetmaskGroupRule(recursive_ips), PoolAction('recursor')) addAction(AllRule(), PoolAction('auth')) pdns authoritative: config-dir=/etc/pdns daemon=yes distributor-threads=3 guardian=yes include-dir=/etc/pdns/pdns.d launch= local-address=SERVER_PUBLIC_IPV4_ADDRESS local-port=5300 module-dir=/usr/lib64/pdns receiver-threads=2 reuseport=yes server-id=SERVER_HOSTNAME slave=no pdns recursor: local-address=SERVER_PUBLIC_IPV4_ADDRESS local-port=5301 allow-from=ALLOWED_NETWORKS reuseport=yes pdns-distributes-queries=yes forward-zones-file=/etc/pdns-recursor/zone-file forward-zones-file: DOMAIN=DNS1_PUBLIC_IPV4_ADDRESS:5300, DNS2_PUBLIC_IPV4_ADDRESS:5300
_______________________________________________ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist