Hmmmm. Why is 'consistent.dns.routing' set to 'false' by default ? Based on Jeff's description, I do not see any value in it, other than this scenario: When 'consistent.dns.routing' set to 'true' , and max_dns_answers is set to a very low value (1 or 2), then if there are a few content items in a very large demand, the load on these specific items would not spread across enough caches, and create uneven load.
Is there any other value in setting 'consistent.dns.routing' to 'false' ? On Tue, Dec 5, 2017 at 10:21 PM, Volz, Dylan <[email protected]> wrote: > Based on the discussion we will be changing the schema default from 0 to 5 > for now; > with the knowledge that this is a complex issue that could benefit from > ensuring we > are following the relevant RFCs and perhaps a configurable default in the > future. > > On 12/5/17, 9:01 AM, "Jeff Elsloo" <[email protected]> wrote: > > I think this discussion has drifted far from Dylan's original intent, > which is to set a reasonable default in the short term. We can argue > about what the default is, but ultimately the real way to fix this is > to ensure that we follow the RFCs. If a resolver cannot switch to TCP, > we can truncate the response and set the truncated header bit. This > would occur, as Eric mentioned indirectly, when EDNS0 is unsupported. > Additionally, when it is supported, the client could be asking for > DNSSEC signatures, which further increases the response size. It does > not make sense for a resolver to support ENDS0 and not be able to > switch to TCP. We shouldn't have to worry about this scenario because > in my opinion it's a misconfiguration on the other side that we cannot > control, therefore we should not code for it because they are not > following standards. > > All of the commentary about what we should set the default to in order > to ensure cache efficiency is highly site specific. Not everyone specs > their caches for 18Gbps, and not everyone has the same cache to cache > group ratios. While I appreciate that this change does impact cache > efficiency, there are other aspects of Traffic Router that impact this > setting such as `consistent.dns.routing`, which by default, is set to > false. When it's false, your answer size will be limited by the > specified amount, but the entire list will be shuffled prior to > setting the limit. This will kill any cache efficiency conversation > unless the operator has set this value to true. I don't believe > there's a "one size fits all" answer here, and because of this we > should really follow the RFCs. > > I think a reasonable default is a good short term solution until more > time can be invested in ensuring that we are 100% compliant with this > aspect of the RFCs. Ideally the default would be a parameter or > something that is configurable instead of being part of the schema, > but that's an entirely different argument. I'm +1 on a reasonable > default. > > Here's a helpful post about when resolvers switch to TCP: > https://serverfault.com/questions/698251/how-does-the- > dns-protocol-switch-from-udp-to-tcp > > Thanks, > -- > Thanks, > Jeff > > > On Tue, Dec 5, 2017 at 8:33 AM, Dave Neuman <[email protected]> wrote: > > Hey Dylan, > > I think since we currently default to 0 (all) and we don't want to > > re-invent the wheel right now, I think 5 sounds like a reasonable > default. > > > > Thanks, > > Dave > > > > On Tue, Dec 5, 2017 at 8:21 AM, Durfey, Ryan < > [email protected]> > > wrote: > > > >> Not sure if EDNS(0) extensions would make a difference here. > >> > >> The real issue for caching is balancing load across many caches > while > >> restricting content to as few caches as possible to maintain cache > >> efficiency. Too few DNS answers risks load piling up on a few > caches and > >> overrunning them (though this is unlikely except in the case of > very high > >> throughput). Too many DNS answers (much more likely) spreads your > >> service’s content across too many caches and increases the cache > churn and > >> risk of hitting cold caches and having poor service performance. > >> > >> I spoke with our DNS team about a year ago about EDNS(0) relative to > >> client sub-netting (ECS) and it was not embraced due to the fact > that it > >> made their recursion jump by several orders of magnitude and broke > the DNS > >> system. Not sure if they plan to use EDNS(0) for other things, but > not > >> sure how that would factor into the load on the caches and need to > spread > >> that load via additional IP responses, but please educate me if you > know > >> something about this. > >> > >> In an ideal world TR monitors the popularity of a service based on > >> incoming request counts per second and potentially expands or > contracts IP > >> response. Given DNS caching that may be difficult to judge > accurately, but > >> we may be able to use it to differentiate between a “1” and “4” > response. > >> I thought I cut a request for that a while back, but I can’t find > it so I > >> created a new one: https://github.com/apache/ > incubator-trafficcontrol/ > >> issues/1614 > >> > >> Ryan Durfey M | 303-524-5099 > >> CDN Support (24x7): 866-405-2993 or [email protected]<mailto: > >> [email protected]> > >> > >> > >> From: "Eric Friedrich (efriedri)" <[email protected]> > >> Reply-To: "[email protected]" < > >> [email protected]> > >> Date: Monday, December 4, 2017 at 6:18 PM > >> To: "[email protected]" < > >> [email protected]>, "[email protected]" > < > >> [email protected]> > >> Subject: Re: Changing max_dns_answers default > >> > >> Does EDNS0 (which TR already supports) reduce the severity of this > >> problem? If so, could TR do an auto detection on if the sending > resolver > >> supports EDNS0 when deciding how big to make the response? > >> > >> —Eric > >> > >> On Dec 4, 2017, at 5:31 PM, Jason Tucker <[email protected]< > mailto: > >> [email protected]>> wrote: > >> HTTP-routing seems to go to the opposite end of the spectrum - the > default > >> is to use a dispersion of "1", which gives best cache efficiency as > Ryan > >> mentions. I think the behavior in this regard should be somewhat > similar > >> between HTTP and DNS routing. > >> __Jason > >> On Mon, Dec 4, 2017 at 10:19 PM, Durfey, Ryan < > [email protected]< > >> mailto:[email protected]>> > >> wrote: > >> I like the idea of code that makes it always under the threshold > and I > >> think this is a good feature to add, but from a practical > perspective we > >> always want the max dns response to be the minimum viable for cache > >> efficiency. Most of our services (95%+) should be set to 1, 2, 3, > or 4 > >> correlated to throughput of the service. Making the default set to > as many > >> as possible ensures that unless you are paying close attention you > will > >> have terrible cache efficiency. I would advocate for 2 or 3 since > this > >> would cover the majority of our services, keep cache efficiency > reasonable, > >> and work for most other applications as well. I would also > advocate to add > >> the threshold check in case someone goes too high or sets it to 0. > >> *Ryan Durfey* M | 303-524-5099 <(303)%20524-5099> > >> CDN Support (24x7): 866-405-2993 <(866)%20405-2993> or > >> [email protected]<mailto:[email protected]> > >> *From: *Jason Tucker <[email protected]<mailto: > [email protected] > >> >> > >> *Reply-To: *"[email protected]<mailto:de > >> [email protected]>" < > >> [email protected]<mailto:dev@ > >> trafficcontrol.incubator.apache.org>>, "[email protected]< > mailto: > >> [email protected]>" < > >> [email protected]<mailto:[email protected]>> > >> *Date: *Monday, December 4, 2017 at 3:10 PM > >> *To: *Phil Sorber <[email protected]<mailto:[email protected]>> > >> *Cc: *"[email protected]<mailto:de > >> [email protected]>" < > >> [email protected]<mailto:dev@ > >> trafficcontrol.incubator.apache.org>> > >> *Subject: *Re: Changing max_dns_answers default > >> I can't comment on the development effort for that (or the compute / > >> latency overhead that it might add to TR), but I think having a > default > >> variable that could be set per TC installation doesn't seem > unreasonable. > >> __Jason > >> On Mon, Dec 4, 2017 at 9:11 PM, Phil Sorber <[email protected] > <mailto:sorb > >> [email protected]>> wrote: > >> What about adding code that would count the bytes dynamically and > make > >> sure it keeps under the threshold? Maybe even make that the > behavior for > >> the current default of 0. > >> On Mon, Dec 4, 2017 at 2:06 PM Jason Tucker <[email protected] > < > >> mailto:[email protected]>> > >> wrote: > >> Yes, this is the UDP thing. We've had customers with clients that > sit > >> behind DNS infrastructure that has problems with large response > packets. > >> However, the "max" is going to be installation dependent, though. > >> Variables > >> such as edge hostname convention, and CDN DNS domain suffixes are > going to > >> cause that threshold to vary from installation to installtion. If > you have > >> short FQDNS, you can fit many of them in a single UDP response. > >> __Jason > >> On Mon, Dec 4, 2017 at 9:00 PM, Phil Sorber <[email protected] > <mailto:sorb > >> [email protected]>> wrote: > >> You say it causes issues with "large cache groups". What is "large" > in > >> this > >> context? Maybe we should pick a default that puts us slightly below > >> that. > >> Reading a little into your comment here, I assume the "problems" > stems > >> from > >> the number of answers that fit in a UDP packet. Maybe we should just > >> make > >> the default below that threshold so we get as close to the max > without > >> causing said problems? > >> Thanks. > >> On Mon, Dec 4, 2017 at 12:52 PM Volz, Dylan <[email protected] > < > >> mailto:[email protected]>> > >> wrote: > >> Hi All, > >> The max_dns_answers has been defaulted to 0, which is an unlimited > >> number > >> of answers, which causes issues for deployments with large cache > >> groups. > >> I > >> opened a PR (1611< > >> https://github.com/apache/incubator-trafficcontrol/pull/1611>< > >> https://github.com/apache/incubator-trafficcontrol/pull/1611%3e>) > to > >> change > >> the default from 0 to 5 which is hopefully a sensible value for most > >> deployments. If this doesn’t seem like a sensible default please > >> respond > >> with alternatives. > >> Thanks, > >> Dylan > >> > >> > >> > > > > -- *Oren Shemesh* Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | [email protected] <[email protected]>
