Hi all

These comments are for:

https://tools.ietf.org/html/draft-ietf-dnsop-edns-client-subnet-06

One of the main concerns while implementing EDNS client-subnet is about
keeping the size of cache small and in check. It seems cache handling
for EDNS client-subnet can be improved by changes to the option
syntax. While the draft may be describing an existing scheme used in
some existing implementations, it needs changes before this draft goes
any further, otherwise it would lead to more duplication in the cache
than necessary.

First, while the draft describes the general scheme and fields
reasonably, behavioral descriptions for implementors are confusing and
inadequate. For example, one cannot make out the prefix against which
the resolver caches answers, and uses cached answers for queries from a
different prefix for subsequent queries, without some amount of
reading-between-the-lines and making assumptions. The level of detail in
the draft is inadequate.

As I understand it, the following could happen when unchecked:

1. Assume an authoritative server has separate answers for example.org/A
for the networks 0.0.0.0/0 (global) and 192.0.2.0/24, i.e, it returns a
particular answer for 192.0.2.0/24 and a different answer for every
other address outside.

2. The following sequence of events occur starting with an empty cache:

 a. client(203.0.113.1) --example.org/A--> resolver
 b. resolver --Q1[option=203.0.113.0/24/0]--> auth
 c. auth --A1[option=203.0.113.0/24/0]--> resolver # (scope prefix is 0)
 d. resolver --A1--> client(203.0.113.1)

 e. client(192.0.2.1) --example.org/A--> resolver
 f. resolver --A1--> client(192.0.2.1)

You can see that in step f. the resolver returns the same global answer
instead of the network specific answer because it already had the global
answer for /0 in cache. This is a problem that this draft's option
permits in weak implementations. [P1]

The draft includes the following suggestion on how to avoid this
problem:

>   If the Authoritative Nameserver operator configures a more specific
>   (longer prefix length) Tailored Response within a configured less
>   specific (shorter prefix length) Tailored Response, then
>   implementations can either:
>
>   1.  Deaggregate the shorter prefix response into multiple longer
>       prefix responses, or,
>
>   2.  Alert the operator that the order of queries will determine which
>       answers get cached, and either warn and continue or treat this as
>       an error and refuse to load the configuration.

As an alternate approach, a resolver could use:

to_cache_prefix_length = MAX(query_source_prefix_length, 
answer_scope_prefix_length);

i.e., in the example above, it would cache the answer for
203.0.113.0/24/0 as 203.0.113.0/24/24. This would fix the problem in the
example above. It would require no auth server handling such as
suggested above, but it would significantly increase cache usage, so
this is a bad idea (included here only as a consideration).

Let's look at #1 above (deaggregate the shorter prefix) which is likely
to be what general implementations support. What this basically means is
that the authoritative server discovers it has answers for 0.0.0.0/0
(global) and 192.0.2.0/24, and the problem in the example above could
occur. So it conjures a suitable SCOPE PREFIX-LENGTH depending on the
request's ADDRESS and SOURCE PREFIX-LENGTH fields for addresses that are
outside 192.0.2.0/24, so that the resolver eventually does not answer
using a cached answer for 0.0.0.0/0 for a query from client address in
192.0.2.0/24. While this will also solve the problem in the example
above, make note of the following:

1. The auth server is responsible for deaggregating the prefixes. The
efficient way for doing this is binary paritioning. So, in the example
above, the auth server would bisect the 0.0.0.0/0 address space into /1,
one half of /1 into /2, one half of /2 into /3, ..., one half of /23
into /24, to provide 24 different subnet answers that can be cached at
the resolver for this question for networks outside 192.0.2.0/24. We'll
examine this further in the next point.

In the example above, a rogue auth server could force duplication and
cache bloat at the resolver without peforming the partitioning, by
answering for every network at /24. In this case, 1<<24 subnet answers
are possible, all but one of which are duplicate. For cache control,
some implementations seem to cap the number of subnets per name at 100
or some such number. In a public resolver, this can permit cache
thrashing and continual fetching. The point is that this is possible due
to a rogue auth server. [P2]

2. Even with the binary partioning from the previous point (which is
about as good as it can get), the example above can have up to 24
duplicate answers in the cache for the 0.0.0.0/0 answer, and up to 1
other answer for 192.0.2.0/24. Resolver implementations already worry
about cache bloat with ECS and this level of duplication of the same
answer should not be required. There doesn't seem to be any trivial way
to avoid duplication at the resolver side (de-duplication from answers
is not simple or efficient) using the current scheme.

Let's consider this with a different example that'll produce a small
diagram. Let the auth server have an answer A1 for 224.0.0.0/3
(224=0b11100000) and a different answer A2 for 0.0.0.0/0 (global
answer).

With an auth that deaggregates, at the resolver, the address tree under
that node in the domain namespace in cache can look like this:

                       IPv4 root [A2(/0)**]
                         /  \
                  0b0.. /    \ 0b1..
                       /      \
                     A2(/1)    X
                              / \
                             /   \
                     0b10.. /     \  0b11..
                           /       \
                        A2(/2)      X
                                   / \
                                  /   \
                       0b110..   /     \ 0b111..
                                /       \
                             A2(/3)    A1(/3)


** This is the global answer, but the auth server shouldn't send it.

Note the duplication of answer A2 above, even in this simple
example. [P3] Subnet specific answers are going to be sparse compared to
the whole IP address space. The scheme in the draft can severely grow
cache usage on a public resolver. The tree structure above is typical,
and suggests that this case can be improved by changes to the option
syntax to add hints from the auth server using which the resolver can
answer without needing such copies.

I suggest adding two more fields: DEPTH (1 octet) and DIRECTION (1
octet). DEPTH=0 and DIRECTION=0xff will match current behavior without
these fields. When DEPTH is 0, DIRECTION must be 0xff and vice versa.

DIRECTION=0 means left, and DIRECTION=1 means right.

With DIRECTION=0, DEPTH is the number of consecutive left-childs from
scope subnet in chain whose respective parent's right-child can use this
answer.

With DIRECTION=1, DEPTH is the number of consecutive right-childs from
scope subnet in chain whose respective parent's left-child can use this
answer.

In all cases, the answer would be valid for client-subnet that exactly
prefix-matches the scope subnet in the cache, and if DEPTH>0, its
DEPTH-1 consecutive children in direction specified in DIRECTION.

1. With the last example, a query with option=192.0.2.0/24/0 would be
replied to by the authoritative server using these new fields with:

  ANSWER SECTION = A2
  FAMILY = 1
  SOURCE PREFIX-LENGTH = 24
  SCOPE PREFIX-LENGTH = 0
  DIRECTION = 1
  DEPTH = 3
  ADDRESS = 192.0.2.0

In the last example, this *single* A2 answer saved in cache would be
valid for client-subnets:

127.0.0.0/1
191.0.0.0/2
223.0.0.0/3

and also be valid for:

0.0.0.0/0 (exact match)
128.0.0.0/1 (exact match)
192.0.0.0/2 (exact match)

It will not be valid for:

224.0.0.0/3

2. For a query with option=224.0.0.0/24/0, the auth server would reply
with:

  ANSWER SECTION = A1
  FAMILY = 1
  SOURCE PREFIX-LENGTH = 24
  SCOPE PREFIX-LENGTH = 3
  DIRECTION = 0xff
  DEPTH = 0
  ADDRESS = 224.0.0.0


I add that there is still confusion on what happens with the scheme in
the current draft, if the resolver queries with
client-subnet=128.0.0.0/1. The authoritative server has two answers in
this network. Does it send A1 or A2? If it sends A2, with scope
prefix-length=1, that'll be used from cache for future resolver queries
where client-subnet=223.0.0.0/3.

A1 may not be correct as the client behind the resolver (that is hidden
by policy) may be in 224.0.0.0/3. The draft says:

>   If an Intermediate Nameserver receives a response which has a longer
>   SCOPE PREFIX-LENGTH than the SOURCE PREFIX-LENGTH that it provided in
>   its query, it SHOULD still provide the result as the answer to the
>   triggering client request even if the client is in a different
>   address range.

Operators who use this feature ought to be careful in designing their
network. For example, there is a possibility that an address record may
be sent to a client which has no route to it if the address is local to
some other network.

                Mukund

Attachment: signature.asc
Description: PGP signature

_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to