> On 19 Oct 2022, at 17:35, [email protected] wrote: > > Thanks for the interesting discussion on the qa.ws.iqt.fiscal.treasury.gov > problem. Nice to know I'm not the only one who doesn't quite understand why > we're getting mixed results despite the obviously non-compliant behavior. > > But got a new one today. Different failure mode, but same thing. Sometimes > works, but sometimes SERVFAIL. > > Noticed when we started to get some server-admin-heartburn when CRL downloads > started to fail because of DNS errors. The servers for the zone fpki.gov are > handing out different DNSKEYs (if the server responds at all). The DNSviz for > this one pretty clearly catches the problem, but you may need a screen > magnifier, > > https://dnsviz.net/d/repo.fpki.gov/Y08v3Q/dnssec/ > > You can read all of the "Errors" on the left. (BTW, this zone was > _completely_ broken for a while this evening. The auth servers appeared down. > Thought they might have been trying to fix this, but looks like it's still > there.) > > I thought we might have caught it midway through in a bad rollover, but it's > been this way for a while and the SOAs on all of the servers match. > > So it's pretty easy to see how something could break. If a resolver gets the > DNSKEYs from a server with ones that don't match the RRSIGs you've got, you > can't validate. > > But here's my question, are DNS resolvers, and specifically, BIND, forgiving > enough to try other authoritative servers for missing DNSKEYs for cases just > like this? Will they searching other authoritative servers in search of a > matching DNSKEY.
BIND will look for a DNSKEY RRset that validate as secure. It will then cache it. > Or can they come at it from the other way? If the RRSIGs don't line up with > the available DNSKEYs, the server doesn't cache these target RRsets and the > resolver makes another try, possibly to a different server. BIND will query other servers. All recursive nameservers should do this as bogus answers are to be treated as if they have not arrived for CD=0 queries. > But even if resolvers do this stuff, I think I still see how this could break > things. If a recursive resolver is doing "forward-only" through another > caching resolver, the end resolver will only get whatever the forwarder has > in its cache. If the middle resolver has incompatible or incomplete DNSKEYs > and RRSIGs, there isn't a way for the end resolver to force the intermediate > resolver to go out and get DNSKEYs from the other authoritative servers for > the zone. This is why “ Always Set the CD Bit on Queries" is stupid. The intermediate servers need to validate responses so that downstream validators get good put. RFC 6840 5.9. Always Set the CD Bit on Queries When processing a request with the Checking Disabled (CD) bit set, a resolver SHOULD attempt to return all response data, even data that has failed DNSSEC validation. Section 3.2.2 of [RFC4035] requires a resolver processing a request with the CD bit set to set the CD bit on its upstream queries. This document further specifies that validating resolvers SHOULD set the CD bit on every upstream query. This is regardless of whether the CD bit was set on the incoming query or whether it has a trust anchor at or above the QNAME. [RFC4035] is ambiguous about what to do when a cached response was obtained with the CD bit unset, a case that only arises when the resolver chooses not to set the CD bit on all upstream queries, as specified above. In the typical case, no new query is required, nor does the cache need to track the state of the CD bit used to make a given query. The problem arises when the cached response is a server failure (RCODE 2), which may indicate that the requested data failed DNSSEC validation at an upstream validating resolver. ([RFC2308] permits caching of server failures for up to five minutes.) In these cases, a new query with the CD bit set is required. Appendix B discusses more of the logic behind the recommendation presented in this section. The problem is that the "sometimes set” model is wrong in the described behaviour and from that the wrong conclusions are drawn. DNSSEC was designed with send CD=0 unless the triggering query had CD=1 and to never return previous CD=1 results without validating them first. What is missing from the DNSSEC RFCs is instructions to retry with CD=1 when you get SERVFAIL from the upstream recursive server to a CD=0 query. The retry with CD=1 lets you work around bad time and bad trust anchors in upstream servers. The desire to reduce the work performed by intermediate servers results in a system that does not work when the servers are under attack or when there are stuff ups with the administration of the zone. The bad answers make it through and the client has no way to recover. > Does that scenario make sense? I've been dumping caches and trying to see > what the server is doing when things are working and when they are not, but > thought I'd just try the people with the deep resolver knowledge. > > But I /really/ just wish .gov orgs would fix their @*%$ DNSSEC! Yep. > _______________________________________________ > dns-operations mailing list > [email protected] > https://lists.dns-oarc.net/mailman/listinfo/dns-operations -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: [email protected] _______________________________________________ dns-operations mailing list [email protected] https://lists.dns-oarc.net/mailman/listinfo/dns-operations
