Hi Warren,

Thank you for the mention, :-)

We all know that, because of network topology, client subnet is the *best*
indicator for CDN traffic management, expecially for Akamai, Google, ...
Totally agree with you:  There is no *best* answer for a country, nor a
city or even a postal address.

My thought is :
For GeoIP-enabled Authoritative Servers, they offen map Client Subnet into
EIL <COUNTRY, AREA, ISP>, then return tailored response only based on
<COUNTRY, AREA, ISP>.
There is *sufficient* answer for <COUNTRY, AREA, ISP>, on a GeoIP-enabled
Authoritative Server‘s *own view*.
EIL can be a trade-off choice for GeoIP-enabled Authoritative Server, and
optimize Recursive Resolver's cache cost.

Latest EIL document is at:
https://github.com/abbypan/dns_test_eil/blob/master/ietf_draft/draft.txt
There are many compatibility, cost vs benefit, operational concerns on EIL
from WG experts now, I am *still revising* it, welcome discussions and
issues.

Example:
xxx.com's Authoritative Server deploy GeoDNS proactively.
=> xxx.com's Authoritative Server itself choose accept *<COUNTRY, AREA,
ISP>* level traffic management.
=> for some operation problem/privacy concerns (maybe like Mukund mentioned
above),  we can use EIL to mitigrate.

ECS: I am network topology close to <111.201.133.0/24>, which is the best
answer on network topology now ?
GeoIP-enabled Authoritative Servers map <111.201.133.0/24> into <China,
Beijing, Unicom ISP>, then find the tailored answer.
Recursive Resolver cache the response with <111.201.133.0/24>, or some
shorter prefix.

EIL: I am network topology close to <China, Beijing, Unicom ISP>, which is
the best answer on network topology now ?
Recursive Resolver cache the response with <China, Beijing, Unicom ISP>,
which can cover many *network topology closed* client subnets.

Warren Kumari <[email protected]>于2017年12月16日周六 上午3:34写道:

> On Fri, Dec 15, 2017 at 11:50 AM, Mukund Sivaraman <[email protected]> wrote:
> > On Thu, Dec 14, 2017 at 07:00:58PM +0100, bert hubert wrote:
> >> On Thu, Dec 14, 2017 at 11:09:13PM +0530, Mukund Sivaraman wrote:
> >> > Any appetite for it? Don't throw things at me.. I ask because the
> >> > current thing is slowly getting more widely deployed and there are
> >> > design issues that can do with a ECS2 that breaks from ECS1 protocol.
> I
> >> > ask because I'm once again having to deal with myriad implementation
> >> > cases and dislike it.
> >>
> >> Could you elaborate what you dislike most?
> >
> > It is too complicated to implement ECS correctly. There are a large
> > number of corner cases. The things that resolvers and authoritative
> > sides have to take care of are quite different. It is more complex
> > than anything else in DNS.
> >
> > I think this should be built again from scratch.
> >
> >> The biggest thing we are noticing is that while it does great things
> >> to getting to a server the content provider likes, it unavoidably
> >> drives doen cache hitrates a lot, introducing a latency penalty.
> >>
> >> The operators we see deploying ECS have tens of thousands of subnets
> >> which all need to be mapped to only a few servers. But you still end up
> >> with tens of thousands of cache entries and therefore tiny cache
> hitrates.
> >>
> >> Such things could be addressed by answering with lists of subnet masks
> to
> >> which this answer would also apply, but this makes little sense
> >> operationally I think.
> >
> > Firstly, correct deaggregation is an important requirement of reducing
> > cache usage. With the current design of ECS protocol, it's very
> > important that correct disjoining of prefixes be done optimally to avoid
> > cache pollution, yet the draft does not specify a suitable algorithm for
> > it (we know how to do it, I think the draft should have stated it).
> >
> > A /n address prefix as specified by ECS option is a perfect binary
> > tree of 1<<n addresses. To correctly deaggregate 0.0.0.0/0 (scope=0)
> > data from a longer prefix such as 10.0.0.0/24, this will result in all
> > these answers to be generated:
> >
> > * 10.0.0.0/24 answer
> > * 10.0.0.0/23 exact match answer (scope > source)
> > * 10.0.1.0/23 answer
> > * 10.0.0.0/22 exact match answer (scope > source)
> > * 10.0.3.0/22 answer
> >
> > and so on.. there are about 2n+1 answers necessary so that a 0.0.0.0/0
> > answer does not override a /n client from receiving its specific answer.
> >
> >                   x
> >            x            x
> >          x   x        x   x
> >         x x x x      x x y y
> >
> > If ECS option had more fields, we could have put the above pattern as
> > a difference of trees (with direction bit and height, e.g., "x"s in
> > the diagram above) and it would have reduced cache usage
> > considerably.
> >
> > This can be generalized with more differences but anyway I think that
> > using QUERY for ECS is a badly done idea (not even mentioning privacy
> > loss).
> >
> > As an example, RPZ does not rely on queries.. it transfers all prefixes
> > for matching in a zone so that the longest prefix match algorithm will
> > not suffer from a previously cached shorter prefix matching and
> > preventing future fetches.
> >
> > Another related problem is this: We often want to match against a GeoIP
> > database (containing what may be changing but maintained prefixes) in
> > associating zone data with geographic/network-topo clients. We want to
> > say "serve this answer for country X or city Y or ASNNNN" and we don't
> > care about managing the actual prefixes.
> >
> > GeoIP is a custom database format, where I can match against GeoIP, but
> > I can't easily deaggregate all its prefixes for an ECS zone to be cached
> > properly.
> >
> > I feel it would have been better to say to a downstream resolver "This
> > answer is for country X", or "A is the answer for country X, B is the
> > answer for country Y, the rest use answer C".
>
> <No hats. other than working at a place that uses ECS to provide
> optimized answers>
> Please see the thread
> https://www.ietf.org/mail-archive/web/dnsop/current/msg19614.html
>
> There is no *best* answer for a country, nor a city or even a postal
> address.
>
> There is instead a best answer for a specific IP address / subnet,
> which depends upon the ISP, the peering connections with that ISP, the
> utilization of the peering links with that ISP, the utilization of the
> datacenter, and then much lower down, where in the physical topology
> that subnet resides.
>
> I'm within spitting distance of Ashburn Equinix (well, I cannot quite
> spit there, but I could easily walk there, it's < 5 miles), but my ISP
> is Comcast. This
> <https://maps.google.com/?q=s+Comcast.+This&entry=gmail&source=g> means
> that, instead of hitting anything in IAD
> (Ashburn), I (currently) instead get sent to MRN (Lenoir, North
> Carolina), which is 400 miles away. My packets also happen to go
> through 111 8th Ave, NY - even though there is stuff in NY, the MRN
> location is better / faster.
>
> If I happened to have chosen a different ISP, Verizon for instance, my
> optimal answers would be very different.
> And no, the "obvious" answer of "just use the AS number then" fails
> equally badly - my latency to 8.8.8.8 (to chose something at random)
> is ~12ms. If I ware a Comcast customer in California and the same
> location were handed out, the latency would be ~80ms.
>
> Note that I'm in a place which is well connected - in many places, the
> "less optimal" answer is much much less optimal...
>
>
> >
> > The design of ECS needs to be reconsidered. I'd prefer something like a
> > zone format for it, than using QUERY. QUERY cannot give complete
> > information about all prefixes and there is a possibility of incorrect
> > caching, and a very high probability of redundant cache pollution.
>
> Sure, happy to reconsider the design, but it is important to know more
> about what the constraints are, and how specific and dynamic the
> answers are.
>
> For exmaple, Akamai says they are in 1,600 networks
> (https://www.akamai.com/uk/en/about/facts-figures.jsp),  "The
> Cloudflare Global Anycast Network"
> (https://www.cloudflare.com/network/) is is powered by 118 data
> centers around the world. "
>
> Google Cloud Platform has added new regions in São Paulo and Mumbai.
> GCP has 13 regions, 39 zones, over 100 points of presence, and a
> well-provisioned global network with 100,000s of miles of fiber optic
> cable." (embarrassingly I couldn't easily find a public page with more
> detail)
>
> Dyn has a map here: https://dyn.com/dns/network-map/
>
>
>
> >
> > >From a resolver's point of view, a non-ECS answer (no client-subnet
> > option) is different from an ECS answer with scope=0 which is different
> > from an ECS answer with source=0, whereas all these may be the same from
> > an authority's point of view. They all need to be cached differently
> > (from an intermediate resolver's view).
> >
> > This thing of scope > source meaning for-exact-match-only is weird as
> > hell when implementing longest prefix matching. It is not convenient
> > to use an off-the-shelf radix tree.
> >
> > ECS relies on the option always being returned for any kind of
> > answers, as some resolvers use that as an indicator of ECS support
> > (and stop using ECS if it ever stops). But ECS does not apply to
> > several kinds of answers (e.g., anything but NOERROR, esp. NXDOMAIN
> > and NODATA have to be consistent across all prefixes.) It doesn't
> > apply to SOA, DNSKEY, NS in answer section, referrals, etc. Yet,
> > many of these need to answer with SCOPE=0.
> >
> > An ACL config option about whether the NS supports ECS or not (to
> > return the option or not) is different from a config option whether
> > the NS passes through ECS or not: the latter would always pass through
> > SOURCE=0 but return REFUSED for any ECS queries that didn't match the
> > ACL; where as the former would return non-ECS reply for any ECS
> > queries that didn't match the ACL).
> >
> > Transitivity of the option has corner cases.
> >
> > I don't have to point out how easy it is for a erroneous /16 to
> > prevent queries to /24 answers shadowed by the /16.
> >
> > Some cache cases: Obviously an ECS cache is different from a
> > zone.. it's not from a single zone, it is not an atomic collection of
> > a single version of zone and ever changing. If there's a /24 answer in
> > cache, and a newer query brings in a /16 answer that shadows it,
> > should the resolver assume that the /16 has precedence because it's
> > newer (hence the /24 should no longer exist) or do a
> > longest-prefix-match against the older /24? What if the /16 then
> > expires and the /24 hasn't expired? An NXDOMAIN answer should expire
> > any previously cached prefix-specific cache entries for that name. A
> > NODATA answer should expire any previously cached prefix-specific
> > cache entries for that type.  Non-ECS data is different from SCOPE=0
> > data. There are questions about trust ranking with usage of ECS data.
> >
> > These are just some topics that I can quickly think of. There are many
> > other issues we faced and discussed during resolver ECS development.
> >
> > The draft leaves many things unspecified, such as more clarity in DNSSEC
> > and handling of negative answers. Many issues were fixed during the
> > draft phase, but I feel it was insufficient.
> >
> >> Can you share your ideas for ECS2?
> >
> > There are many quirks in ECS. I don't want to propose specific ideas
> > now, except that we should gather requirements and start from
> > scratch.
>
> Yes, much of my soapbox rant was about just this -- understanding the
> requirements is important - the reason that CDNs provide different
> answers based upon the IP address it is a proxy for latency /
> performance.
> I'm sure we got many things wrong in ECS, but a redesign needs to be
> informed by the use case and requirements.
>
> (This mail not meant to sound as grumpy as it turned out :-) )
>
> W
>
> > We have to reduce complexity of the protocol on both auth and
> > caching resolver sides. I think it should be designed again from
> > requirements without being a tweak of ECS1. The current protocol
> > complicates DNS implementation significantly.
> >
> >                 Mukund
> >
> > _______________________________________________
> > DNSOP mailing list
> > [email protected]
> > https://www.ietf.org/mailman/listinfo/dnsop
>
>
>
> --
> I don't think the execution is relevant when it was obviously a bad
> idea in the first place.
> This is like putting rabid weasels in your pants, and later expressing
> regret at having chosen those particular rabid weasels and that pair
> of pants.
>    ---maf
>
> _______________________________________________
> DNSOP mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/dnsop
>
-- 
致礼  Best Regards

潘蓝兰  Pan Lanlan
_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to