Hmmmm.
Why is 'consistent.dns.routing' set to 'false' by default ?
Based on Jeff's description, I do not see any value in it, other than this
scenario:
When  'consistent.dns.routing' set to 'true' , and max_dns_answers is set
to a very low value (1 or 2), then if there are a few content items in a
very large demand, the load on these specific items would not spread across
enough caches, and create uneven load.

Is there any other value in setting  'consistent.dns.routing' to 'false' ?

On Tue, Dec 5, 2017 at 10:21 PM, Volz, Dylan <[email protected]> wrote:

> Based on the discussion we will be changing the schema default from 0 to 5
> for now;
> with the knowledge that this is a complex issue that could benefit from
> ensuring we
> are following the relevant RFCs and perhaps a configurable default in the
> future.
>
> On 12/5/17, 9:01 AM, "Jeff Elsloo" <[email protected]> wrote:
>
>     I think this discussion has drifted far from Dylan's original intent,
>     which is to set a reasonable default in the short term. We can argue
>     about what the default is, but ultimately the real way to fix this is
>     to ensure that we follow the RFCs. If a resolver cannot switch to TCP,
>     we can truncate the response and set the truncated header bit. This
>     would occur, as Eric mentioned indirectly, when EDNS0 is unsupported.
>     Additionally, when it is supported, the client could be asking for
>     DNSSEC signatures, which further increases the response size. It does
>     not make sense for a resolver to support ENDS0 and not be able to
>     switch to TCP. We shouldn't have to worry about this scenario because
>     in my opinion it's a misconfiguration on the other side that we cannot
>     control, therefore we should not code for it because they are not
>     following standards.
>
>     All of the commentary about what we should set the default to in order
>     to ensure cache efficiency is highly site specific. Not everyone specs
>     their caches for 18Gbps, and not everyone has the same cache to cache
>     group ratios. While I appreciate that this change does impact cache
>     efficiency, there are other aspects of Traffic Router that impact this
>     setting such as `consistent.dns.routing`, which by default, is set to
>     false. When it's false, your answer size will be limited by the
>     specified amount, but the entire list will be shuffled prior to
>     setting the limit. This will kill any cache efficiency conversation
>     unless the operator has set this value to true. I don't believe
>     there's a "one size fits all" answer here, and because of this we
>     should really follow the RFCs.
>
>     I think a reasonable default is a good short term solution until more
>     time can be invested in ensuring that we are 100% compliant with this
>     aspect of the RFCs. Ideally the default would be a parameter or
>     something that is configurable instead of being part of the schema,
>     but that's an entirely different argument. I'm +1 on a reasonable
>     default.
>
>     Here's a helpful post about when resolvers switch to TCP:
>     https://serverfault.com/questions/698251/how-does-the-
> dns-protocol-switch-from-udp-to-tcp
>
>     Thanks,
>     --
>     Thanks,
>     Jeff
>
>
>     On Tue, Dec 5, 2017 at 8:33 AM, Dave Neuman <[email protected]> wrote:
>     > Hey Dylan,
>     > I think since we currently default to 0 (all) and we don't want to
>     > re-invent the wheel right now, I think 5 sounds like a reasonable
> default.
>     >
>     > Thanks,
>     > Dave
>     >
>     > On Tue, Dec 5, 2017 at 8:21 AM, Durfey, Ryan <
> [email protected]>
>     > wrote:
>     >
>     >> Not sure if EDNS(0) extensions would make a difference here.
>     >>
>     >> The real issue for caching is balancing load across many caches
> while
>     >> restricting content to as few caches as possible to maintain cache
>     >> efficiency.  Too few DNS answers risks load piling up on a few
> caches and
>     >> overrunning them (though this is unlikely except in the case of
> very high
>     >> throughput).  Too many DNS answers (much more likely) spreads your
>     >> service’s content across too many caches and increases the cache
> churn and
>     >> risk of hitting cold caches and having poor service performance.
>     >>
>     >> I spoke with our DNS team about a year ago about EDNS(0) relative to
>     >> client sub-netting (ECS) and it was not embraced due to the fact
> that it
>     >> made their recursion jump by several orders of magnitude and broke
> the DNS
>     >> system.  Not sure if they plan to use EDNS(0) for other things, but
> not
>     >> sure how that would factor into the load on the caches and need to
> spread
>     >> that load via additional IP responses, but please educate me if you
> know
>     >> something about this.
>     >>
>     >> In an ideal world TR monitors the popularity of a service based on
>     >> incoming request counts per second and potentially expands or
> contracts IP
>     >> response.  Given DNS caching that may be difficult to judge
> accurately, but
>     >> we may be able to use it to differentiate between a “1” and “4”
> response.
>     >> I thought I cut a request for that a while back, but I can’t find
> it so I
>     >> created a new one: https://github.com/apache/
> incubator-trafficcontrol/
>     >> issues/1614
>     >>
>     >> Ryan Durfey    M | 303-524-5099
>     >> CDN Support (24x7): 866-405-2993 or [email protected]<mailto:
>     >> [email protected]>
>     >>
>     >>
>     >> From: "Eric Friedrich (efriedri)" <[email protected]>
>     >> Reply-To: "[email protected]" <
>     >> [email protected]>
>     >> Date: Monday, December 4, 2017 at 6:18 PM
>     >> To: "[email protected]" <
>     >> [email protected]>, "[email protected]"
> <
>     >> [email protected]>
>     >> Subject: Re: Changing max_dns_answers default
>     >>
>     >> Does EDNS0 (which TR already supports) reduce the severity of this
>     >> problem? If so, could TR do an auto detection on if the sending
> resolver
>     >> supports EDNS0 when deciding how big to make the response?
>     >>
>     >> —Eric
>     >>
>     >> On Dec 4, 2017, at 5:31 PM, Jason Tucker <[email protected]<
> mailto:
>     >> [email protected]>> wrote:
>     >> HTTP-routing seems to go to the opposite end of the spectrum - the
> default
>     >> is to use a dispersion of "1", which gives best cache efficiency as
> Ryan
>     >> mentions. I think the behavior in this regard should be somewhat
> similar
>     >> between HTTP and DNS routing.
>     >> __Jason
>     >> On Mon, Dec 4, 2017 at 10:19 PM, Durfey, Ryan <
> [email protected]<
>     >> mailto:[email protected]>>
>     >> wrote:
>     >> I like the idea of code that makes it always under the threshold
> and I
>     >> think this is a good feature to add, but from a practical
> perspective we
>     >> always want the max dns response to be the minimum viable for cache
>     >> efficiency.  Most of our services (95%+) should be set to 1, 2, 3,
> or 4
>     >> correlated to throughput of the service.  Making the default set to
> as many
>     >> as possible ensures that unless you are paying close attention you
> will
>     >> have terrible cache efficiency.  I would advocate for 2 or 3 since
> this
>     >> would cover the majority of our services, keep cache efficiency
> reasonable,
>     >> and work for most other applications as well.  I would also
> advocate to add
>     >> the threshold check in case someone goes too high or sets it to 0.
>     >> *Ryan Durfey*    M | 303-524-5099 <(303)%20524-5099>
>     >> CDN Support (24x7): 866-405-2993 <(866)%20405-2993> or
>     >> [email protected]<mailto:[email protected]>
>     >> *From: *Jason Tucker <[email protected]<mailto:
> [email protected]
>     >> >>
>     >> *Reply-To: *"[email protected]<mailto:de
>     >> [email protected]>" <
>     >> [email protected]<mailto:dev@
>     >> trafficcontrol.incubator.apache.org>>, "[email protected]<
> mailto:
>     >> [email protected]>" <
>     >> [email protected]<mailto:[email protected]>>
>     >> *Date: *Monday, December 4, 2017 at 3:10 PM
>     >> *To: *Phil Sorber <[email protected]<mailto:[email protected]>>
>     >> *Cc: *"[email protected]<mailto:de
>     >> [email protected]>" <
>     >> [email protected]<mailto:dev@
>     >> trafficcontrol.incubator.apache.org>>
>     >> *Subject: *Re: Changing max_dns_answers default
>     >> I can't comment on the development effort for that (or the compute /
>     >> latency overhead that it might add to TR), but I think having a
> default
>     >> variable that could be set per TC installation doesn't seem
> unreasonable.
>     >> __Jason
>     >> On Mon, Dec 4, 2017 at 9:11 PM, Phil Sorber <[email protected]
> <mailto:sorb
>     >> [email protected]>> wrote:
>     >> What about adding code that would count the bytes dynamically and
> make
>     >> sure it keeps under the threshold? Maybe even make that the
> behavior for
>     >> the current default of 0.
>     >> On Mon, Dec 4, 2017 at 2:06 PM Jason Tucker <[email protected]
> <
>     >> mailto:[email protected]>>
>     >> wrote:
>     >> Yes, this is the UDP thing. We've had customers with clients that
> sit
>     >> behind DNS infrastructure that has problems with large response
> packets.
>     >> However, the "max" is going to be installation dependent, though.
>     >> Variables
>     >> such as edge hostname convention, and CDN DNS domain suffixes are
> going to
>     >> cause that threshold to vary from installation to installtion. If
> you have
>     >> short FQDNS, you can fit many of them in a single UDP response.
>     >> __Jason
>     >> On Mon, Dec 4, 2017 at 9:00 PM, Phil Sorber <[email protected]
> <mailto:sorb
>     >> [email protected]>> wrote:
>     >> You say it causes issues with "large cache groups". What is "large"
> in
>     >> this
>     >> context? Maybe we should pick a default that puts us slightly below
>     >> that.
>     >> Reading a little into your comment here, I assume the "problems"
> stems
>     >> from
>     >> the number of answers that fit in a UDP packet. Maybe we should just
>     >> make
>     >> the default below that threshold so we get as close to the max
> without
>     >> causing said problems?
>     >> Thanks.
>     >> On Mon, Dec 4, 2017 at 12:52 PM Volz, Dylan <[email protected]
> <
>     >> mailto:[email protected]>>
>     >> wrote:
>     >> Hi All,
>     >> The max_dns_answers has been defaulted to 0, which is an unlimited
>     >> number
>     >> of answers, which causes issues for deployments with large cache
>     >> groups.
>     >> I
>     >> opened a PR (1611<
>     >> https://github.com/apache/incubator-trafficcontrol/pull/1611><
>     >> https://github.com/apache/incubator-trafficcontrol/pull/1611%3e>)
> to
>     >> change
>     >> the default from 0 to 5 which is hopefully a sensible value for most
>     >> deployments. If this doesn’t seem like a sensible default please
>     >> respond
>     >> with alternatives.
>     >> Thanks,
>     >> Dylan
>     >>
>     >>
>     >>
>
>
>
>


-- 

*Oren Shemesh*
Qwilt | Work: +972-72-2221637| Mobile: +972-50-2281168 | [email protected]
<[email protected]>

Reply via email to