Hello dnsop.

We have encountered a DNS deployment like this:

caching forwarder (forwards to)
-> anycast IP
-> load balancer level 1
-> load balancer level 2
-> recursive resolver

The trouble is, each layer uses a different timeout and retry strategy and this caused very interesting behaviors where some layers abandoned original request and resent it, while other layers were still trying to resolve an answer nobody was waiting for. With sort of snowball effect.

One possibility to counter that a new ENDS option:
- 16 bit value as number of milliseconds the client is willing to wait.
- Requester SHOULD substract RTT to responder if it is known.
- Responder SHOULD use that as an upper bound for its own timeout.
- If timeout expires, responder SHOULD send back SERVFAIL with suitable EDE code 'user specified timeout expired'.

I sense such an option could prevent the situation when layer#1 timed out and resent the query to (different) layer#2 instance, while first instances of layers #2, #3, and #4 are still waiting and doing their own recursion and retries.

I think it would be useful for complicated forwarding setups even if stubs don't see the need or take long time to adopt it.

What this group thinks? Worth a draft?

--
Petr Špaček
Internet Systems Consortium

_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to