Re: Changing max_dns_answers default

Volz, Dylan Tue, 05 Dec 2017 12:22:00 -0800

Based on the discussion we will be changing the schema default from 0 to 5 for 
now;
with the knowledge that this is a complex issue that could benefit from 
ensuring we 
are following the relevant RFCs and perhaps a configurable default in the 
future.


On 12/5/17, 9:01 AM, "Jeff Elsloo" <[email protected]> wrote:

    I think this discussion has drifted far from Dylan's original intent,
    which is to set a reasonable default in the short term. We can argue
    about what the default is, but ultimately the real way to fix this is
    to ensure that we follow the RFCs. If a resolver cannot switch to TCP,
    we can truncate the response and set the truncated header bit. This
    would occur, as Eric mentioned indirectly, when EDNS0 is unsupported.
    Additionally, when it is supported, the client could be asking for
    DNSSEC signatures, which further increases the response size. It does
    not make sense for a resolver to support ENDS0 and not be able to
    switch to TCP. We shouldn't have to worry about this scenario because
    in my opinion it's a misconfiguration on the other side that we cannot
    control, therefore we should not code for it because they are not
    following standards.
    
    All of the commentary about what we should set the default to in order
    to ensure cache efficiency is highly site specific. Not everyone specs
    their caches for 18Gbps, and not everyone has the same cache to cache
    group ratios. While I appreciate that this change does impact cache
    efficiency, there are other aspects of Traffic Router that impact this
    setting such as `consistent.dns.routing`, which by default, is set to
    false. When it's false, your answer size will be limited by the
    specified amount, but the entire list will be shuffled prior to
    setting the limit. This will kill any cache efficiency conversation
    unless the operator has set this value to true. I don't believe
    there's a "one size fits all" answer here, and because of this we
    should really follow the RFCs.
    
    I think a reasonable default is a good short term solution until more
    time can be invested in ensuring that we are 100% compliant with this
    aspect of the RFCs. Ideally the default would be a parameter or
    something that is configurable instead of being part of the schema,
    but that's an entirely different argument. I'm +1 on a reasonable
    default.
    
    Here's a helpful post about when resolvers switch to TCP:
    
https://serverfault.com/questions/698251/how-does-the-dns-protocol-switch-from-udp-to-tcp
    
    Thanks,
    --
    Thanks,
    Jeff
    
    
    On Tue, Dec 5, 2017 at 8:33 AM, Dave Neuman <[email protected]> wrote:
    > Hey Dylan,
    > I think since we currently default to 0 (all) and we don't want to
    > re-invent the wheel right now, I think 5 sounds like a reasonable default.
    >
    > Thanks,
    > Dave
    >
    > On Tue, Dec 5, 2017 at 8:21 AM, Durfey, Ryan <[email protected]>
    > wrote:
    >
    >> Not sure if EDNS(0) extensions would make a difference here.
    >>
    >> The real issue for caching is balancing load across many caches while
    >> restricting content to as few caches as possible to maintain cache
    >> efficiency.  Too few DNS answers risks load piling up on a few caches and
    >> overrunning them (though this is unlikely except in the case of very high
    >> throughput).  Too many DNS answers (much more likely) spreads your
    >> service’s content across too many caches and increases the cache churn 
and
    >> risk of hitting cold caches and having poor service performance.
    >>
    >> I spoke with our DNS team about a year ago about EDNS(0) relative to
    >> client sub-netting (ECS) and it was not embraced due to the fact that it
    >> made their recursion jump by several orders of magnitude and broke the 
DNS
    >> system.  Not sure if they plan to use EDNS(0) for other things, but not
    >> sure how that would factor into the load on the caches and need to spread
    >> that load via additional IP responses, but please educate me if you know
    >> something about this.
    >>
    >> In an ideal world TR monitors the popularity of a service based on
    >> incoming request counts per second and potentially expands or contracts 
IP
    >> response.  Given DNS caching that may be difficult to judge accurately, 
but
    >> we may be able to use it to differentiate between a “1” and “4” response.
    >> I thought I cut a request for that a while back, but I can’t find it so I
    >> created a new one: https://github.com/apache/incubator-trafficcontrol/
    >> issues/1614
    >>
    >> Ryan Durfey    M | 303-524-5099
    >> CDN Support (24x7): 866-405-2993 or [email protected]<mailto:
    >> [email protected]>
    >>
    >>
    >> From: "Eric Friedrich (efriedri)" <[email protected]>
    >> Reply-To: "[email protected]" <
    >> [email protected]>
    >> Date: Monday, December 4, 2017 at 6:18 PM
    >> To: "[email protected]" <
    >> [email protected]>, "[email protected]" <
    >> [email protected]>
    >> Subject: Re: Changing max_dns_answers default
    >>
    >> Does EDNS0 (which TR already supports) reduce the severity of this
    >> problem? If so, could TR do an auto detection on if the sending resolver
    >> supports EDNS0 when deciding how big to make the response?
    >>
    >> —Eric
    >>
    >> On Dec 4, 2017, at 5:31 PM, Jason Tucker <[email protected]<mailto:
    >> [email protected]>> wrote:
    >> HTTP-routing seems to go to the opposite end of the spectrum - the 
default
    >> is to use a dispersion of "1", which gives best cache efficiency as Ryan
    >> mentions. I think the behavior in this regard should be somewhat similar
    >> between HTTP and DNS routing.
    >> __Jason
    >> On Mon, Dec 4, 2017 at 10:19 PM, Durfey, Ryan <[email protected]<
    >> mailto:[email protected]>>
    >> wrote:
    >> I like the idea of code that makes it always under the threshold and I
    >> think this is a good feature to add, but from a practical perspective we
    >> always want the max dns response to be the minimum viable for cache
    >> efficiency.  Most of our services (95%+) should be set to 1, 2, 3, or 4
    >> correlated to throughput of the service.  Making the default set to as 
many
    >> as possible ensures that unless you are paying close attention you will
    >> have terrible cache efficiency.  I would advocate for 2 or 3 since this
    >> would cover the majority of our services, keep cache efficiency 
reasonable,
    >> and work for most other applications as well.  I would also advocate to 
add
    >> the threshold check in case someone goes too high or sets it to 0.
    >> *Ryan Durfey*    M | 303-524-5099 <(303)%20524-5099>
    >> CDN Support (24x7): 866-405-2993 <(866)%20405-2993> or
    >> [email protected]<mailto:[email protected]>
    >> *From: *Jason Tucker 
<[email protected]<mailto:[email protected]
    >> >>
    >> *Reply-To: *"[email protected]<mailto:de
    >> [email protected]>" <
    >> [email protected]<mailto:dev@
    >> trafficcontrol.incubator.apache.org>>, "[email protected]<mailto:
    >> [email protected]>" <
    >> [email protected]<mailto:[email protected]>>
    >> *Date: *Monday, December 4, 2017 at 3:10 PM
    >> *To: *Phil Sorber <[email protected]<mailto:[email protected]>>
    >> *Cc: *"[email protected]<mailto:de
    >> [email protected]>" <
    >> [email protected]<mailto:dev@
    >> trafficcontrol.incubator.apache.org>>
    >> *Subject: *Re: Changing max_dns_answers default
    >> I can't comment on the development effort for that (or the compute /
    >> latency overhead that it might add to TR), but I think having a default
    >> variable that could be set per TC installation doesn't seem unreasonable.
    >> __Jason
    >> On Mon, Dec 4, 2017 at 9:11 PM, Phil Sorber 
<[email protected]<mailto:sorb
    >> [email protected]>> wrote:
    >> What about adding code that would count the bytes dynamically and make
    >> sure it keeps under the threshold? Maybe even make that the behavior for
    >> the current default of 0.
    >> On Mon, Dec 4, 2017 at 2:06 PM Jason Tucker <[email protected]<
    >> mailto:[email protected]>>
    >> wrote:
    >> Yes, this is the UDP thing. We've had customers with clients that sit
    >> behind DNS infrastructure that has problems with large response packets.
    >> However, the "max" is going to be installation dependent, though.
    >> Variables
    >> such as edge hostname convention, and CDN DNS domain suffixes are going 
to
    >> cause that threshold to vary from installation to installtion. If you 
have
    >> short FQDNS, you can fit many of them in a single UDP response.
    >> __Jason
    >> On Mon, Dec 4, 2017 at 9:00 PM, Phil Sorber 
<[email protected]<mailto:sorb
    >> [email protected]>> wrote:
    >> You say it causes issues with "large cache groups". What is "large" in
    >> this
    >> context? Maybe we should pick a default that puts us slightly below
    >> that.
    >> Reading a little into your comment here, I assume the "problems" stems
    >> from
    >> the number of answers that fit in a UDP packet. Maybe we should just
    >> make
    >> the default below that threshold so we get as close to the max without
    >> causing said problems?
    >> Thanks.
    >> On Mon, Dec 4, 2017 at 12:52 PM Volz, Dylan <[email protected]<
    >> mailto:[email protected]>>
    >> wrote:
    >> Hi All,
    >> The max_dns_answers has been defaulted to 0, which is an unlimited
    >> number
    >> of answers, which causes issues for deployments with large cache
    >> groups.
    >> I
    >> opened a PR (1611<
    >> https://github.com/apache/incubator-trafficcontrol/pull/1611><
    >> https://github.com/apache/incubator-trafficcontrol/pull/1611%3e>) to
    >> change
    >> the default from 0 to 5 which is hopefully a sensible value for most
    >> deployments. If this doesn’t seem like a sensible default please
    >> respond
    >> with alternatives.
    >> Thanks,
    >> Dylan
    >>
    >>
    >>

Re: Changing max_dns_answers default

Reply via email to