On Fri, Dec 12, 2025 at 8:34 AM, Petr Špaček <[email protected]> wrote:
> Hello dnsop. > > We have encountered a DNS deployment like this: > > caching forwarder (forwards to) > -> anycast IP > -> load balancer level 1 > -> load balancer level 2 > -> recursive resolver > > The trouble is, each layer uses a different timeout and retry strategy and > this caused very interesting behaviors where some layers abandoned original > request and resent it, while other layers were still trying to resolve an > answer nobody was waiting for. With sort of snowball effect. > > One possibility to counter that a new ENDS option: > - 16 bit value as number of milliseconds the client is willing to wait. > - Requester SHOULD substract RTT to responder if it is known. > - Responder SHOULD use that as an upper bound for its own timeout. > - If timeout expires, responder SHOULD send back SERVFAIL with suitable > EDE code 'user specified timeout expired'. > > I sense such an option could prevent the situation when layer#1 timed out > and resent the query to (different) layer#2 instance, while first instances > of layers #2, #3, and #4 are still waiting and doing their own recursion > and retries. > > I think it would be useful for complicated forwarding setups even if stubs > don't see the need or take long time to adopt it. > > What this group thinks? Worth a draft? > Personally I'd much rather an *operational* document describing how setups like the above are a bad idea and are likely to come back and bite you. "Doctor, doctor, it hurts when I do this…." There is a massive amount of tribal knowledge about how to build, run and deploy DNS services, but we haven't really done a great job of writing that down. Back in February of this year Puneet and I started writing down some "DNS Best Operational Practices"[0]. The plan is / was to just collect all of the shared **operational** knowledge in one place ()sort of like a big knowledge base), and then go through and break it out into advice for Authoritative Operators (Large and Small), Recursive Operators (Large, Small, Enterprise), Common (e.g "You should monitor stuff!"), etc[1]. These are then ideally brought to DNSOP (or similar) and published as RFCs if appropriate, or something like a Guidebook / How To for introductory and deployment advice. Link: https://docs.google.com/document/d/1A0dJX4LNiyFDjK-ECR6hMD_l1JoKF1nprOWX5noEXrU/edit?usp=sharing I explicitly do not want this to lead to confusion around where "protocol-like" or consensus decisions get made, nor into fights around things like iterations, MTU, etc - and so the initial version of the document was just "Here is operational advice from RFCs" - this keeps it clear that DNSOP is where standardization happens, and this is more collection and collation of exiting advice. I think that this is a very very important principle to keep in mind - this document (and things which spring from it) are "Here is guidance, commentary, clarification, exposition, interpretation, annotation, and elaboration based on RFCs" - basically something like the long awaited "Hitchhikers Guide to the DNS - written for people actually *using* this stuff". W [0]: Somewhat modeled on the NOG BCOPs model. [1]: Yes, I am aware of the original DNS-OARC panel; that is what kickstarted this document. I have shared it with a few people [2]: This is currently "Anyone with the link can view" - please poke me if you can comment / edit access. I was about to send it out like that, but then figured I didn't want to deal with potential spam…. > -- > Petr Špaček > Internet Systems Consortium > > _______________________________________________ > DNSOP mailing list -- [email protected] > To unsubscribe send an email to [email protected] > >
_______________________________________________ DNSOP mailing list -- [email protected] To unsubscribe send an email to [email protected]
