Hi Warren, Thank you for the mention, :-)
We all know that, because of network topology, client subnet is the *best* indicator for CDN traffic management, expecially for Akamai, Google, ... Totally agree with you: There is no *best* answer for a country, nor a city or even a postal address. My thought is : For GeoIP-enabled Authoritative Servers, they offen map Client Subnet into EIL <COUNTRY, AREA, ISP>, then return tailored response only based on <COUNTRY, AREA, ISP>. There is *sufficient* answer for <COUNTRY, AREA, ISP>, on a GeoIP-enabled Authoritative Server‘s *own view*. EIL can be a trade-off choice for GeoIP-enabled Authoritative Server, and optimize Recursive Resolver's cache cost. Latest EIL document is at: https://github.com/abbypan/dns_test_eil/blob/master/ietf_draft/draft.txt There are many compatibility, cost vs benefit, operational concerns on EIL from WG experts now, I am *still revising* it, welcome discussions and issues. Example: xxx.com's Authoritative Server deploy GeoDNS proactively. => xxx.com's Authoritative Server itself choose accept *<COUNTRY, AREA, ISP>* level traffic management. => for some operation problem/privacy concerns (maybe like Mukund mentioned above), we can use EIL to mitigrate. ECS: I am network topology close to <111.201.133.0/24>, which is the best answer on network topology now ? GeoIP-enabled Authoritative Servers map <111.201.133.0/24> into <China, Beijing, Unicom ISP>, then find the tailored answer. Recursive Resolver cache the response with <111.201.133.0/24>, or some shorter prefix. EIL: I am network topology close to <China, Beijing, Unicom ISP>, which is the best answer on network topology now ? Recursive Resolver cache the response with <China, Beijing, Unicom ISP>, which can cover many *network topology closed* client subnets. Warren Kumari <[email protected]>于2017年12月16日周六 上午3:34写道: > On Fri, Dec 15, 2017 at 11:50 AM, Mukund Sivaraman <[email protected]> wrote: > > On Thu, Dec 14, 2017 at 07:00:58PM +0100, bert hubert wrote: > >> On Thu, Dec 14, 2017 at 11:09:13PM +0530, Mukund Sivaraman wrote: > >> > Any appetite for it? Don't throw things at me.. I ask because the > >> > current thing is slowly getting more widely deployed and there are > >> > design issues that can do with a ECS2 that breaks from ECS1 protocol. > I > >> > ask because I'm once again having to deal with myriad implementation > >> > cases and dislike it. > >> > >> Could you elaborate what you dislike most? > > > > It is too complicated to implement ECS correctly. There are a large > > number of corner cases. The things that resolvers and authoritative > > sides have to take care of are quite different. It is more complex > > than anything else in DNS. > > > > I think this should be built again from scratch. > > > >> The biggest thing we are noticing is that while it does great things > >> to getting to a server the content provider likes, it unavoidably > >> drives doen cache hitrates a lot, introducing a latency penalty. > >> > >> The operators we see deploying ECS have tens of thousands of subnets > >> which all need to be mapped to only a few servers. But you still end up > >> with tens of thousands of cache entries and therefore tiny cache > hitrates. > >> > >> Such things could be addressed by answering with lists of subnet masks > to > >> which this answer would also apply, but this makes little sense > >> operationally I think. > > > > Firstly, correct deaggregation is an important requirement of reducing > > cache usage. With the current design of ECS protocol, it's very > > important that correct disjoining of prefixes be done optimally to avoid > > cache pollution, yet the draft does not specify a suitable algorithm for > > it (we know how to do it, I think the draft should have stated it). > > > > A /n address prefix as specified by ECS option is a perfect binary > > tree of 1<<n addresses. To correctly deaggregate 0.0.0.0/0 (scope=0) > > data from a longer prefix such as 10.0.0.0/24, this will result in all > > these answers to be generated: > > > > * 10.0.0.0/24 answer > > * 10.0.0.0/23 exact match answer (scope > source) > > * 10.0.1.0/23 answer > > * 10.0.0.0/22 exact match answer (scope > source) > > * 10.0.3.0/22 answer > > > > and so on.. there are about 2n+1 answers necessary so that a 0.0.0.0/0 > > answer does not override a /n client from receiving its specific answer. > > > > x > > x x > > x x x x > > x x x x x x y y > > > > If ECS option had more fields, we could have put the above pattern as > > a difference of trees (with direction bit and height, e.g., "x"s in > > the diagram above) and it would have reduced cache usage > > considerably. > > > > This can be generalized with more differences but anyway I think that > > using QUERY for ECS is a badly done idea (not even mentioning privacy > > loss). > > > > As an example, RPZ does not rely on queries.. it transfers all prefixes > > for matching in a zone so that the longest prefix match algorithm will > > not suffer from a previously cached shorter prefix matching and > > preventing future fetches. > > > > Another related problem is this: We often want to match against a GeoIP > > database (containing what may be changing but maintained prefixes) in > > associating zone data with geographic/network-topo clients. We want to > > say "serve this answer for country X or city Y or ASNNNN" and we don't > > care about managing the actual prefixes. > > > > GeoIP is a custom database format, where I can match against GeoIP, but > > I can't easily deaggregate all its prefixes for an ECS zone to be cached > > properly. > > > > I feel it would have been better to say to a downstream resolver "This > > answer is for country X", or "A is the answer for country X, B is the > > answer for country Y, the rest use answer C". > > <No hats. other than working at a place that uses ECS to provide > optimized answers> > Please see the thread > https://www.ietf.org/mail-archive/web/dnsop/current/msg19614.html > > There is no *best* answer for a country, nor a city or even a postal > address. > > There is instead a best answer for a specific IP address / subnet, > which depends upon the ISP, the peering connections with that ISP, the > utilization of the peering links with that ISP, the utilization of the > datacenter, and then much lower down, where in the physical topology > that subnet resides. > > I'm within spitting distance of Ashburn Equinix (well, I cannot quite > spit there, but I could easily walk there, it's < 5 miles), but my ISP > is Comcast. This > <https://maps.google.com/?q=s+Comcast.+This&entry=gmail&source=g> means > that, instead of hitting anything in IAD > (Ashburn), I (currently) instead get sent to MRN (Lenoir, North > Carolina), which is 400 miles away. My packets also happen to go > through 111 8th Ave, NY - even though there is stuff in NY, the MRN > location is better / faster. > > If I happened to have chosen a different ISP, Verizon for instance, my > optimal answers would be very different. > And no, the "obvious" answer of "just use the AS number then" fails > equally badly - my latency to 8.8.8.8 (to chose something at random) > is ~12ms. If I ware a Comcast customer in California and the same > location were handed out, the latency would be ~80ms. > > Note that I'm in a place which is well connected - in many places, the > "less optimal" answer is much much less optimal... > > > > > > The design of ECS needs to be reconsidered. I'd prefer something like a > > zone format for it, than using QUERY. QUERY cannot give complete > > information about all prefixes and there is a possibility of incorrect > > caching, and a very high probability of redundant cache pollution. > > Sure, happy to reconsider the design, but it is important to know more > about what the constraints are, and how specific and dynamic the > answers are. > > For exmaple, Akamai says they are in 1,600 networks > (https://www.akamai.com/uk/en/about/facts-figures.jsp), "The > Cloudflare Global Anycast Network" > (https://www.cloudflare.com/network/) is is powered by 118 data > centers around the world. " > > Google Cloud Platform has added new regions in São Paulo and Mumbai. > GCP has 13 regions, 39 zones, over 100 points of presence, and a > well-provisioned global network with 100,000s of miles of fiber optic > cable." (embarrassingly I couldn't easily find a public page with more > detail) > > Dyn has a map here: https://dyn.com/dns/network-map/ > > > > > > > >From a resolver's point of view, a non-ECS answer (no client-subnet > > option) is different from an ECS answer with scope=0 which is different > > from an ECS answer with source=0, whereas all these may be the same from > > an authority's point of view. They all need to be cached differently > > (from an intermediate resolver's view). > > > > This thing of scope > source meaning for-exact-match-only is weird as > > hell when implementing longest prefix matching. It is not convenient > > to use an off-the-shelf radix tree. > > > > ECS relies on the option always being returned for any kind of > > answers, as some resolvers use that as an indicator of ECS support > > (and stop using ECS if it ever stops). But ECS does not apply to > > several kinds of answers (e.g., anything but NOERROR, esp. NXDOMAIN > > and NODATA have to be consistent across all prefixes.) It doesn't > > apply to SOA, DNSKEY, NS in answer section, referrals, etc. Yet, > > many of these need to answer with SCOPE=0. > > > > An ACL config option about whether the NS supports ECS or not (to > > return the option or not) is different from a config option whether > > the NS passes through ECS or not: the latter would always pass through > > SOURCE=0 but return REFUSED for any ECS queries that didn't match the > > ACL; where as the former would return non-ECS reply for any ECS > > queries that didn't match the ACL). > > > > Transitivity of the option has corner cases. > > > > I don't have to point out how easy it is for a erroneous /16 to > > prevent queries to /24 answers shadowed by the /16. > > > > Some cache cases: Obviously an ECS cache is different from a > > zone.. it's not from a single zone, it is not an atomic collection of > > a single version of zone and ever changing. If there's a /24 answer in > > cache, and a newer query brings in a /16 answer that shadows it, > > should the resolver assume that the /16 has precedence because it's > > newer (hence the /24 should no longer exist) or do a > > longest-prefix-match against the older /24? What if the /16 then > > expires and the /24 hasn't expired? An NXDOMAIN answer should expire > > any previously cached prefix-specific cache entries for that name. A > > NODATA answer should expire any previously cached prefix-specific > > cache entries for that type. Non-ECS data is different from SCOPE=0 > > data. There are questions about trust ranking with usage of ECS data. > > > > These are just some topics that I can quickly think of. There are many > > other issues we faced and discussed during resolver ECS development. > > > > The draft leaves many things unspecified, such as more clarity in DNSSEC > > and handling of negative answers. Many issues were fixed during the > > draft phase, but I feel it was insufficient. > > > >> Can you share your ideas for ECS2? > > > > There are many quirks in ECS. I don't want to propose specific ideas > > now, except that we should gather requirements and start from > > scratch. > > Yes, much of my soapbox rant was about just this -- understanding the > requirements is important - the reason that CDNs provide different > answers based upon the IP address it is a proxy for latency / > performance. > I'm sure we got many things wrong in ECS, but a redesign needs to be > informed by the use case and requirements. > > (This mail not meant to sound as grumpy as it turned out :-) ) > > W > > > We have to reduce complexity of the protocol on both auth and > > caching resolver sides. I think it should be designed again from > > requirements without being a tweak of ECS1. The current protocol > > complicates DNS implementation significantly. > > > > Mukund > > > > _______________________________________________ > > DNSOP mailing list > > [email protected] > > https://www.ietf.org/mailman/listinfo/dnsop > > > > -- > I don't think the execution is relevant when it was obviously a bad > idea in the first place. > This is like putting rabid weasels in your pants, and later expressing > regret at having chosen those particular rabid weasels and that pair > of pants. > ---maf > > _______________________________________________ > DNSOP mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/dnsop > -- 致礼 Best Regards 潘蓝兰 Pan Lanlan
_______________________________________________ DNSOP mailing list [email protected] https://www.ietf.org/mailman/listinfo/dnsop
