Re: Changing max_dns_answers default

Jeff Elsloo Thu, 07 Dec 2017 08:53:30 -0800

The main reason I decided to make the default false was to preserve
existing functionality. Consistent DNS routing will force a subset of
a given cache group to be used to service a given request for a DNS
delivery service when limits exist which would have implications on
cache efficiency, and I did not want to force that on anyone. The
feature itself was added to be used in concert with DNSSEC, as it
greatly reduces the number of permutations of dynamic zones that must
be signed.
--
Thanks,
Jeff



On Wed, Dec 6, 2017 at 1:18 AM, Oren Shemesh <[email protected]> wrote:
> Hmmmm.
> Why is 'consistent.dns.routing' set to 'false' by default ?
> Based on Jeff's description, I do not see any value in it, other than this
> scenario:
> When  'consistent.dns.routing' set to 'true' , and max_dns_answers is set
> to a very low value (1 or 2), then if there are a few content items in a
> very large demand, the load on these specific items would not spread across
> enough caches, and create uneven load.
>
> Is there any other value in setting  'consistent.dns.routing' to 'false' ?
>
> On Tue, Dec 5, 2017 at 10:21 PM, Volz, Dylan <[email protected]> wrote:
>
>> Based on the discussion we will be changing the schema default from 0 to 5
>> for now;
>> with the knowledge that this is a complex issue that could benefit from
>> ensuring we
>> are following the relevant RFCs and perhaps a configurable default in the
>> future.
>>
>> On 12/5/17, 9:01 AM, "Jeff Elsloo" <[email protected]> wrote:
>>
>>     I think this discussion has drifted far from Dylan's original intent,
>>     which is to set a reasonable default in the short term. We can argue
>>     about what the default is, but ultimately the real way to fix this is
>>     to ensure that we follow the RFCs. If a resolver cannot switch to TCP,
>>     we can truncate the response and set the truncated header bit. This
>>     would occur, as Eric mentioned indirectly, when EDNS0 is unsupported.
>>     Additionally, when it is supported, the client could be asking for
>>     DNSSEC signatures, which further increases the response size. It does
>>     not make sense for a resolver to support ENDS0 and not be able to
>>     switch to TCP. We shouldn't have to worry about this scenario because
>>     in my opinion it's a misconfiguration on the other side that we cannot
>>     control, therefore we should not code for it because they are not
>>     following standards.
>>
>>     All of the commentary about what we should set the default to in order
>>     to ensure cache efficiency is highly site specific. Not everyone specs
>>     their caches for 18Gbps, and not everyone has the same cache to cache
>>     group ratios. While I appreciate that this change does impact cache
>>     efficiency, there are other aspects of Traffic Router that impact this
>>     setting such as `consistent.dns.routing`, which by default, is set to
>>     false. When it's false, your answer size will be limited by the
>>     specified amount, but the entire list will be shuffled prior to
>>     setting the limit. This will kill any cache efficiency conversation
>>     unless the operator has set this value to true. I don't believe
>>     there's a "one size fits all" answer here, and because of this we
>>     should really follow the RFCs.
>>
>>     I think a reasonable default is a good short term solution until more
>>     time can be invested in ensuring that we are 100% compliant with this
>>     aspect of the RFCs. Ideally the default would be a parameter or
>>     something that is configurable instead of being part of the schema,
>>     but that's an entirely different argument. I'm +1 on a reasonable
>>     default.
>>
>>     Here's a helpful post about when resolvers switch to TCP:
>>     https://serverfault.com/questions/698251/how-does-the-
>> dns-protocol-switch-from-udp-to-tcp
>>
>>     Thanks,
>>     --
>>     Thanks,
>>     Jeff
>>
>>
>>     On Tue, Dec 5, 2017 at 8:33 AM, Dave Neuman <[email protected]> wrote:
>>     > Hey Dylan,
>>     > I think since we currently default to 0 (all) and we don't want to
>>     > re-invent the wheel right now, I think 5 sounds like a reasonable
>> default.
>>     >
>>     > Thanks,
>>     > Dave
>>     >
>>     > On Tue, Dec 5, 2017 at 8:21 AM, Durfey, Ryan <
>> [email protected]>
>>     > wrote:
>>     >
>>     >> Not sure if EDNS(0) extensions would make a difference here.
>>     >>
>>     >> The real issue for caching is balancing load across many caches
>> while
>>     >> restricting content to as few caches as possible to maintain cache
>>     >> efficiency.  Too few DNS answers risks load piling up on a few
>> caches and
>>     >> overrunning them (though this is unlikely except in the case of
>> very high
>>     >> throughput).  Too many DNS answers (much more likely) spreads your
>>     >> service’s content across too many caches and increases the cache
>> churn and
>>     >> risk of hitting cold caches and having poor service performance.
>>     >>
>>     >> I spoke with our DNS team about a year ago about EDNS(0) relative to
>>     >> client sub-netting (ECS) and it was not embraced due to the fact
>> that it
>>     >> made their recursion jump by several orders of magnitude and broke
>> the DNS
>>     >> system.  Not sure if they plan to use EDNS(0) for other things, but
>> not
>>     >> sure how that would factor into the load on the caches and need to
>> spread
>>     >> that load via additional IP responses, but please educate me if you
>> know
>>     >> something about this.
>>     >>
>>     >> In an ideal world TR monitors the popularity of a service based on
>>     >> incoming request counts per second and potentially expands or
>> contracts IP
>>     >> response.  Given DNS caching that may be difficult to judge
>> accurately, but
>>     >> we may be able to use it to differentiate between a “1” and “4”
>> response.
>>     >> I thought I cut a request for that a while back, but I can’t find
>> it so I
>>     >> created a new one: https://github.com/apache/
>> incubator-trafficcontrol/
>>     >> issues/1614
>>     >>
>>     >> Ryan Durfey    M | 303-524-5099
>>     >> CDN Support (24x7): 866-405-2993 or [email protected]<mailto:
>>     >> [email protected]>
>>     >>
>>     >>
>>     >> From: "Eric Friedrich (efriedri)" <[email protected]>
>>     >> Reply-To: "[email protected]" <
>>     >> [email protected]>
>>     >> Date: Monday, December 4, 2017 at 6:18 PM
>>     >> To: "[email protected]" <
>>     >> [email protected]>, "[email protected]"
>> <
>>     >> [email protected]>
>>     >> Subject: Re: Changing max_dns_answers default
>>     >>
>>     >> Does EDNS0 (which TR already supports) reduce the severity of this
>>     >> problem? If so, could TR do an auto detection on if the sending
>> resolver
>>     >> supports EDNS0 when deciding how big to make the response?
>>     >>
>>     >> —Eric
>>     >>
>>     >> On Dec 4, 2017, at 5:31 PM, Jason Tucker <[email protected]<
>> mailto:
>>     >> [email protected]>> wrote:
>>     >> HTTP-routing seems to go to the opposite end of the spectrum - the
>> default
>>     >> is to use a dispersion of "1", which gives best cache efficiency as
>> Ryan
>>     >> mentions. I think the behavior in this regard should be somewhat
>> similar
>>     >> between HTTP and DNS routing.
>>     >> __Jason
>>     >> On Mon, Dec 4, 2017 at 10:19 PM, Durfey, Ryan <
>> [email protected]<
>>     >> mailto:[email protected]>>
>>     >> wrote:
>>     >> I like the idea of code that makes it always under the threshold
>> and I
>>     >> think this is a good feature to add, but from a practical
>> perspective we
>>     >> always want the max dns response to be the minimum viable for cache
>>     >> efficiency.  Most of our services (95%+) should be set to 1, 2, 3,
>> or 4
>>     >> correlated to throughput of the service.  Making the default set to
>> as many
>>     >> as possible ensures that unless you are paying close attention you
>> will
>>     >> have terrible cache efficiency.  I would advocate for 2 or 3 since
>> this
>>     >> would cover the majority of our services, keep cache efficiency
>> reasonable,
>>     >> and work for most other applications as well.  I would also
>> advocate to add
>>     >> the threshold check in case someone goes too high or sets it to 0.
>>     >> *Ryan Durfey*    M | 303-524-5099 <(303)%20524-5099>
>>     >> CDN Support (24x7): 866-405-2993 <(866)%20405-2993> or
>>     >> [email protected]<mailto:[email protected]>
>>     >> *From: *Jason Tucker <[email protected]<mailto:
>> [email protected]
>>     >> >>
>>     >> *Reply-To: *"[email protected]<mailto:de
>>     >> [email protected]>" <
>>     >> [email protected]<mailto:dev@
>>     >> trafficcontrol.incubator.apache.org>>, "[email protected]<
>> mailto:
>>     >> [email protected]>" <
>>     >> [email protected]<mailto:[email protected]>>
>>     >> *Date: *Monday, December 4, 2017 at 3:10 PM
>>     >> *To: *Phil Sorber <[email protected]<mailto:[email protected]>>
>>     >> *Cc: *"[email protected]<mailto:de
>>     >> [email protected]>" <
>>     >> [email protected]<mailto:dev@
>>     >> trafficcontrol.incubator.apache.org>>
>>     >> *Subject: *Re: Changing max_dns_answers default
>>     >> I can't comment on the development effort for that (or the compute /
>>     >> latency overhead that it might add to TR), but I think having a
>> default
>>     >> variable that could be set per TC installation doesn't seem
>> unreasonable.
>>     >> __Jason
>>     >> On Mon, Dec 4, 2017 at 9:11 PM, Phil Sorber <[email protected]
>> <mailto:sorb
>>     >> [email protected]>> wrote:
>>     >> What about adding code that would count the bytes dynamically and
>> make
>>     >> sure it keeps under the threshold? Maybe even make that the
>> behavior for
>>     >> the current default of 0.
>>     >> On Mon, Dec 4, 2017 at 2:06 PM Jason Tucker <[email protected]
>> <
>>     >> mailto:[email protected]>>
>>     >> wrote:
>>     >> Yes, this is the UDP thing. We've had customers with clients that
>> sit
>>     >> behind DNS infrastructure that has problems with large response
>> packets.
>>     >> However, the "max" is going to be installation dependent, though.
>>     >> Variables
>>     >> such as edge hostname convention, and CDN DNS domain suffixes are
>> going to
>>     >> cause that threshold to vary from installation to installtion. If
>> you have
>>     >> short FQDNS, you can fit many of them in a single UDP response.
>>     >> __Jason
>>     >> On Mon, Dec 4, 2017 at 9:00 PM, Phil Sorber <[email protected]
>> <mailto:sorb
>>     >> [email protected]>> wrote:
>>     >> You say it causes issues with "large cache groups". What is "large"
>> in
>>     >> this
>>     >> context? Maybe we should pick a default that puts us slightly below
>>     >> that.
>>     >> Reading a little into your comment here, I assume the "problems"
>> stems
>>     >> from
>>     >> the number of answers that fit in a UDP packet. Maybe we should just
>>     >> make
>>     >> the default below that threshold so we get as close to the max
>> without
>>     >> causing said problems?
>>     >> Thanks.
>>     >> On Mon, Dec 4, 2017 at 12:52 PM Volz, Dylan <[email protected]
>> <
>>     >> mailto:[email protected]>>
>>     >> wrote:
>>     >> Hi All,
>>     >> The max_dns_answers has been defaulted to 0, which is an unlimited
>>     >> number
>>     >> of answers, which causes issues for deployments with large cache
>>     >> groups.
>>     >> I
>>     >> opened a PR (1611<
>>     >> https://github.com/apache/incubator-trafficcontrol/pull/1611><
>>     >> https://github.com/apache/incubator-trafficcontrol/pull/1611%3e>)
>> to
>>     >> change
>>     >> the default from 0 to 5 which is hopefully a sensible value for most
>>     >> deployments. If this doesn’t seem like a sensible default please
>>     >> respond
>>     >> with alternatives.
>>     >> Thanks,
>>     >> Dylan
>>     >>
>>     >>
>>     >>
>>
>>
>>
>>
>
>
> --
>
> *Oren Shemesh*
> Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | [email protected]
> <[email protected]>

Re: Changing max_dns_answers default

Reply via email to