Not sure if EDNS(0) extensions would make a difference here. The real issue for caching is balancing load across many caches while restricting content to as few caches as possible to maintain cache efficiency. Too few DNS answers risks load piling up on a few caches and overrunning them (though this is unlikely except in the case of very high throughput). Too many DNS answers (much more likely) spreads your service’s content across too many caches and increases the cache churn and risk of hitting cold caches and having poor service performance.
I spoke with our DNS team about a year ago about EDNS(0) relative to client sub-netting (ECS) and it was not embraced due to the fact that it made their recursion jump by several orders of magnitude and broke the DNS system. Not sure if they plan to use EDNS(0) for other things, but not sure how that would factor into the load on the caches and need to spread that load via additional IP responses, but please educate me if you know something about this. In an ideal world TR monitors the popularity of a service based on incoming request counts per second and potentially expands or contracts IP response. Given DNS caching that may be difficult to judge accurately, but we may be able to use it to differentiate between a “1” and “4” response. I thought I cut a request for that a while back, but I can’t find it so I created a new one: https://github.com/apache/incubator-trafficcontrol/issues/1614 Ryan Durfey M | 303-524-5099 CDN Support (24x7): 866-405-2993 or [email protected]<mailto:[email protected]> From: "Eric Friedrich (efriedri)" <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Monday, December 4, 2017 at 6:18 PM To: "[email protected]" <[email protected]>, "[email protected]" <[email protected]> Subject: Re: Changing max_dns_answers default Does EDNS0 (which TR already supports) reduce the severity of this problem? If so, could TR do an auto detection on if the sending resolver supports EDNS0 when deciding how big to make the response? —Eric On Dec 4, 2017, at 5:31 PM, Jason Tucker <[email protected]<mailto:[email protected]>> wrote: HTTP-routing seems to go to the opposite end of the spectrum - the default is to use a dispersion of "1", which gives best cache efficiency as Ryan mentions. I think the behavior in this regard should be somewhat similar between HTTP and DNS routing. __Jason On Mon, Dec 4, 2017 at 10:19 PM, Durfey, Ryan <[email protected]<mailto:[email protected]>> wrote: I like the idea of code that makes it always under the threshold and I think this is a good feature to add, but from a practical perspective we always want the max dns response to be the minimum viable for cache efficiency. Most of our services (95%+) should be set to 1, 2, 3, or 4 correlated to throughput of the service. Making the default set to as many as possible ensures that unless you are paying close attention you will have terrible cache efficiency. I would advocate for 2 or 3 since this would cover the majority of our services, keep cache efficiency reasonable, and work for most other applications as well. I would also advocate to add the threshold check in case someone goes too high or sets it to 0. *Ryan Durfey* M | 303-524-5099 <(303)%20524-5099> CDN Support (24x7): 866-405-2993 <(866)%20405-2993> or [email protected]<mailto:[email protected]> *From: *Jason Tucker <[email protected]<mailto:[email protected]>> *Reply-To: *"[email protected]<mailto:[email protected]>" < [email protected]<mailto:[email protected]>>, "[email protected]<mailto:[email protected]>" < [email protected]<mailto:[email protected]>> *Date: *Monday, December 4, 2017 at 3:10 PM *To: *Phil Sorber <[email protected]<mailto:[email protected]>> *Cc: *"[email protected]<mailto:[email protected]>" < [email protected]<mailto:[email protected]>> *Subject: *Re: Changing max_dns_answers default I can't comment on the development effort for that (or the compute / latency overhead that it might add to TR), but I think having a default variable that could be set per TC installation doesn't seem unreasonable. __Jason On Mon, Dec 4, 2017 at 9:11 PM, Phil Sorber <[email protected]<mailto:[email protected]>> wrote: What about adding code that would count the bytes dynamically and make sure it keeps under the threshold? Maybe even make that the behavior for the current default of 0. On Mon, Dec 4, 2017 at 2:06 PM Jason Tucker <[email protected]<mailto:[email protected]>> wrote: Yes, this is the UDP thing. We've had customers with clients that sit behind DNS infrastructure that has problems with large response packets. However, the "max" is going to be installation dependent, though. Variables such as edge hostname convention, and CDN DNS domain suffixes are going to cause that threshold to vary from installation to installtion. If you have short FQDNS, you can fit many of them in a single UDP response. __Jason On Mon, Dec 4, 2017 at 9:00 PM, Phil Sorber <[email protected]<mailto:[email protected]>> wrote: You say it causes issues with "large cache groups". What is "large" in this context? Maybe we should pick a default that puts us slightly below that. Reading a little into your comment here, I assume the "problems" stems from the number of answers that fit in a UDP packet. Maybe we should just make the default below that threshold so we get as close to the max without causing said problems? Thanks. On Mon, Dec 4, 2017 at 12:52 PM Volz, Dylan <[email protected]<mailto:[email protected]>> wrote: Hi All, The max_dns_answers has been defaulted to 0, which is an unlimited number of answers, which causes issues for deployments with large cache groups. I opened a PR (1611< https://github.com/apache/incubator-trafficcontrol/pull/1611><https://github.com/apache/incubator-trafficcontrol/pull/1611%3e>) to change the default from 0 to 5 which is hopefully a sensible value for most deployments. If this doesn’t seem like a sensible default please respond with alternatives. Thanks, Dylan
