Hi all These comments are for:
https://tools.ietf.org/html/draft-ietf-dnsop-edns-client-subnet-06 One of the main concerns while implementing EDNS client-subnet is about keeping the size of cache small and in check. It seems cache handling for EDNS client-subnet can be improved by changes to the option syntax. While the draft may be describing an existing scheme used in some existing implementations, it needs changes before this draft goes any further, otherwise it would lead to more duplication in the cache than necessary. First, while the draft describes the general scheme and fields reasonably, behavioral descriptions for implementors are confusing and inadequate. For example, one cannot make out the prefix against which the resolver caches answers, and uses cached answers for queries from a different prefix for subsequent queries, without some amount of reading-between-the-lines and making assumptions. The level of detail in the draft is inadequate. As I understand it, the following could happen when unchecked: 1. Assume an authoritative server has separate answers for example.org/A for the networks 0.0.0.0/0 (global) and 192.0.2.0/24, i.e, it returns a particular answer for 192.0.2.0/24 and a different answer for every other address outside. 2. The following sequence of events occur starting with an empty cache: a. client(203.0.113.1) --example.org/A--> resolver b. resolver --Q1[option=203.0.113.0/24/0]--> auth c. auth --A1[option=203.0.113.0/24/0]--> resolver # (scope prefix is 0) d. resolver --A1--> client(203.0.113.1) e. client(192.0.2.1) --example.org/A--> resolver f. resolver --A1--> client(192.0.2.1) You can see that in step f. the resolver returns the same global answer instead of the network specific answer because it already had the global answer for /0 in cache. This is a problem that this draft's option permits in weak implementations. [P1] The draft includes the following suggestion on how to avoid this problem: > If the Authoritative Nameserver operator configures a more specific > (longer prefix length) Tailored Response within a configured less > specific (shorter prefix length) Tailored Response, then > implementations can either: > > 1. Deaggregate the shorter prefix response into multiple longer > prefix responses, or, > > 2. Alert the operator that the order of queries will determine which > answers get cached, and either warn and continue or treat this as > an error and refuse to load the configuration. As an alternate approach, a resolver could use: to_cache_prefix_length = MAX(query_source_prefix_length, answer_scope_prefix_length); i.e., in the example above, it would cache the answer for 203.0.113.0/24/0 as 203.0.113.0/24/24. This would fix the problem in the example above. It would require no auth server handling such as suggested above, but it would significantly increase cache usage, so this is a bad idea (included here only as a consideration). Let's look at #1 above (deaggregate the shorter prefix) which is likely to be what general implementations support. What this basically means is that the authoritative server discovers it has answers for 0.0.0.0/0 (global) and 192.0.2.0/24, and the problem in the example above could occur. So it conjures a suitable SCOPE PREFIX-LENGTH depending on the request's ADDRESS and SOURCE PREFIX-LENGTH fields for addresses that are outside 192.0.2.0/24, so that the resolver eventually does not answer using a cached answer for 0.0.0.0/0 for a query from client address in 192.0.2.0/24. While this will also solve the problem in the example above, make note of the following: 1. The auth server is responsible for deaggregating the prefixes. The efficient way for doing this is binary paritioning. So, in the example above, the auth server would bisect the 0.0.0.0/0 address space into /1, one half of /1 into /2, one half of /2 into /3, ..., one half of /23 into /24, to provide 24 different subnet answers that can be cached at the resolver for this question for networks outside 192.0.2.0/24. We'll examine this further in the next point. In the example above, a rogue auth server could force duplication and cache bloat at the resolver without peforming the partitioning, by answering for every network at /24. In this case, 1<<24 subnet answers are possible, all but one of which are duplicate. For cache control, some implementations seem to cap the number of subnets per name at 100 or some such number. In a public resolver, this can permit cache thrashing and continual fetching. The point is that this is possible due to a rogue auth server. [P2] 2. Even with the binary partioning from the previous point (which is about as good as it can get), the example above can have up to 24 duplicate answers in the cache for the 0.0.0.0/0 answer, and up to 1 other answer for 192.0.2.0/24. Resolver implementations already worry about cache bloat with ECS and this level of duplication of the same answer should not be required. There doesn't seem to be any trivial way to avoid duplication at the resolver side (de-duplication from answers is not simple or efficient) using the current scheme. Let's consider this with a different example that'll produce a small diagram. Let the auth server have an answer A1 for 224.0.0.0/3 (224=0b11100000) and a different answer A2 for 0.0.0.0/0 (global answer). With an auth that deaggregates, at the resolver, the address tree under that node in the domain namespace in cache can look like this: IPv4 root [A2(/0)**] / \ 0b0.. / \ 0b1.. / \ A2(/1) X / \ / \ 0b10.. / \ 0b11.. / \ A2(/2) X / \ / \ 0b110.. / \ 0b111.. / \ A2(/3) A1(/3) ** This is the global answer, but the auth server shouldn't send it. Note the duplication of answer A2 above, even in this simple example. [P3] Subnet specific answers are going to be sparse compared to the whole IP address space. The scheme in the draft can severely grow cache usage on a public resolver. The tree structure above is typical, and suggests that this case can be improved by changes to the option syntax to add hints from the auth server using which the resolver can answer without needing such copies. I suggest adding two more fields: DEPTH (1 octet) and DIRECTION (1 octet). DEPTH=0 and DIRECTION=0xff will match current behavior without these fields. When DEPTH is 0, DIRECTION must be 0xff and vice versa. DIRECTION=0 means left, and DIRECTION=1 means right. With DIRECTION=0, DEPTH is the number of consecutive left-childs from scope subnet in chain whose respective parent's right-child can use this answer. With DIRECTION=1, DEPTH is the number of consecutive right-childs from scope subnet in chain whose respective parent's left-child can use this answer. In all cases, the answer would be valid for client-subnet that exactly prefix-matches the scope subnet in the cache, and if DEPTH>0, its DEPTH-1 consecutive children in direction specified in DIRECTION. 1. With the last example, a query with option=192.0.2.0/24/0 would be replied to by the authoritative server using these new fields with: ANSWER SECTION = A2 FAMILY = 1 SOURCE PREFIX-LENGTH = 24 SCOPE PREFIX-LENGTH = 0 DIRECTION = 1 DEPTH = 3 ADDRESS = 192.0.2.0 In the last example, this *single* A2 answer saved in cache would be valid for client-subnets: 127.0.0.0/1 191.0.0.0/2 223.0.0.0/3 and also be valid for: 0.0.0.0/0 (exact match) 128.0.0.0/1 (exact match) 192.0.0.0/2 (exact match) It will not be valid for: 224.0.0.0/3 2. For a query with option=224.0.0.0/24/0, the auth server would reply with: ANSWER SECTION = A1 FAMILY = 1 SOURCE PREFIX-LENGTH = 24 SCOPE PREFIX-LENGTH = 3 DIRECTION = 0xff DEPTH = 0 ADDRESS = 224.0.0.0 I add that there is still confusion on what happens with the scheme in the current draft, if the resolver queries with client-subnet=128.0.0.0/1. The authoritative server has two answers in this network. Does it send A1 or A2? If it sends A2, with scope prefix-length=1, that'll be used from cache for future resolver queries where client-subnet=223.0.0.0/3. A1 may not be correct as the client behind the resolver (that is hidden by policy) may be in 224.0.0.0/3. The draft says: > If an Intermediate Nameserver receives a response which has a longer > SCOPE PREFIX-LENGTH than the SOURCE PREFIX-LENGTH that it provided in > its query, it SHOULD still provide the result as the answer to the > triggering client request even if the client is in a different > address range. Operators who use this feature ought to be careful in designing their network. For example, there is a possibility that an address record may be sent to a client which has no route to it if the address is local to some other network. Mukund
signature.asc
Description: PGP signature
_______________________________________________ DNSOP mailing list [email protected] https://www.ietf.org/mailman/listinfo/dnsop
