Hi Joe, The domain whatsapp.com doesn't guarantee integrity to you (they have dnssec turned off, at least last I checked). It's possible that someone got in your middle and inserted a bogus record. This being said I'M ignorant to the fact that nlnetlabs have changed their internal database, so this is likely not a corruption issue but stems from the wire.
Hopefully my 2 cents are helpful. -peter On Sun, Sep 15, 2019 at 06:23:28PM -0700, Joe Barnett wrote: > I've been seeing some issues which I believe to be related to dns/resolving. > The short of it is that the results of > > # dig web.whatsapp.com > > start out as: > > ; <<>> DiG 9.4.2-P2 <<>> web.whatsapp.com > ;; global options: printcmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57665 > ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 > > ;; QUESTION SECTION: > ;web.whatsapp.com. IN A > > ;; ANSWER SECTION: > web.whatsapp.com. 3595 IN CNAME mmx-ds.cdn.whatsapp.net. > mmx-ds.cdn.whatsapp.net. 55 IN A 31.13.70.49 > > ;; Query time: 6 msec > ;; SERVER: 192.168.254.254#53(192.168.254.254) > ;; WHEN: Sun Sep 15 14:46:24 2019 > ;; MSG SIZE rcvd: 87 > > which seems reasonable (and functional), but then soon become: > > ;; Warning: Message parser reports malformed message packet. > > ; <<>> DiG 9.4.2-P2 <<>> web.whatsapp.com > ;; global options: printcmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40939 > ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 > > ;; QUESTION SECTION: > ;web.whatsapp.com. IN A > > ;; ANSWER SECTION: > web.whatsapp.com. 3528 IN CNAME mmx-ds.cdn.whatsapp.net. > mmx-ds.cdn.whatsapp.net. 30772 RESERVED0 A \# 4 1F0D4631 > > ;; Query time: 2 msec > ;; SERVER: 192.168.254.254#53(192.168.254.254) > ;; WHEN: Sun Sep 15 14:47:31 2019 > ;; MSG SIZE rcvd: 87 > > At which point I am no longer able to access web.whatsapp.com. Given that > whatsapp is a facebook property, I tried the above against facebook.com, > www.facebook.com, instagram.com, and www.instagram.com as well. With the > exception of instagram.com, the other three (facebook, www.facebook, > www.instagram) return a hex (?) formatted version of the IP address, similar > to what is seen in the later of the above examples. My thinking is (or was) > that there are some issues relating to fb's DNS. From outside of my > network, however, other resolvers seem to be able to continually resolve the > above names correctly. I don't know what those resolvers are, but > specifically I am referring to whatever Linode and DigitalOcean use in the > nameservers they provide to their basic Linux vms (I am using the default > network config in my vms at Linode and DigitalOcean). I have a suspicion > that Linode uses unbound, but I do not know how to verify that. Oh, as far > as I can tell, those facebook-family names *seem* to be the only names for > which I see this behavior -- all other names that I have tried to run > through dig (and nslookup) seem to return reasonable and seemingly correct > results. > > A bit about my (home) network. I have Cox cable internet service, an Arris > SBG7580-AC, and an OpenBSD 6.5 machine that sits between the modem and the > rest of the network. I(we) do use the modem in router mode (but without > using the built-in WiFi) as my wife's work git-up consists of a > pre-configured black-box of a Juniper device. Not wanting that device in > the rest of our network, I set the modem to "RoutedWithNAT" and the two > network devices plug into the modem, but provide two separate networks. For > remote ingress into the rest of the network, I set the modem's DMZ to point > to the OpenBSD box. My pf.conf does the usual small network stuff including > NAT, a bit of redirection, etc. It has changed very little in the past > several years. My unbound.conf is also nearly unchanged since I first set > it up when OpenBSD dropped bind and replaced it with unbound. My OpenBSD > machine provides name resolving for the rest of the network. My > unbound.conf follows: > > server: > interface: 0.0.0.0 > interface: ::1 > do-ip6: no > > access-control: 0.0.0.0/0 refuse > access-control: 127.0.0.0/8 allow > access-control: 192.168.0.0/16 allow > access-control: 10.0.0.0/24 allow > access-control: 172.16.0.0/24 allow > access-control: ::0/0 refuse > access-control: ::1 allow > > hide-identity: yes > hide-version: yes > > # ftp://FTP.INTERNIC.NET/domain/named.cache > root-hints: "/var/unbound/etc/named.cache" > > # uncomment to enable DNSSEC > auto-trust-anchor-file: "/var/unbound/db/root.key" > > ### various local-zone, local-data, and local-date-ptr ### > > remote-control: > control-enable: yes > control-use-cert: yes > control-interface: /var/run/unbound.sock > > do-ip6, root-hints, and auto-trust-anchor-file are somewhat recent additions > to my unbound.conf, but I experience the same behavior with unbound.conf as > above, and also when I comment out those three additions (bringing it back > to a configuration that has worked for several years). > > My OpenBSD machine is an APU2 which I have been using without issue for over > a year. My backup machine is an ALIX2D3 I think it is called. Other than > the APU running amd64, and the ALIX running i386, the machines are otherwise > configured exactly the same. The APU2 has been consistently maintained, and > this behavior did start soon after I applied the libexpat update via > syspatch. The ALIX machine, however, has not been patched (meaning it > contains 6.5 as it was at release). I do not know much about the inner > workings of DNS, and thinking that, perhaps, the packets contain XML and > that the recent libexpat update is causing issues, I backed the update out > of the APU2, but still get the same results. Similarly, swapping the > (non-updated) ALIX in place of the APU2 results in the same behavior. > > Please forgive my verbosity, but I figured more info is probably better than > less. My knowledge of DNS and other network services is limited -- I hope I > have explained this in a way that can be understood. > > Thanks, > > Joe