On Tue, Sep 29, 2015 at 04:26:38PM -0400, Dave Lawrence wrote:
> David Dagon writes:
> > I have some concerns, which I describe below. [...]
>
> David,
>
> Thank you very much for your thoughtful comments. Broadly speaking, I
> very much agree with the bulk of them. Yet my current reaction is not
> to make any more alterations to the existing document. It describes
> the deployed protocol as-is, and your comments are appropriate for
> consideration for the revised protocol, where I can assure you they
> will definitely be integrated.
>
> Is there something specific about documenting (yet not endorsing) the
> in-use protocol that you think is important to get into the document
> before publication?
I'm preparing more notes, but wanted to offer more observation:
1) Testing Sundown?
-- Many authorities still answer edns-client-subnet iteration, using
the draft/testing option code (0x50FA, instead of the assigned
0x0008).
-- Some return some appropriate rfc 1035 RCODE error for 0x50FA
encoded queries.
-- Some answer 0x50FA-typed queries with 0x0008 answers. (This was
a surprise).
I wonder if the document you're working on would need to comment
on this practice. Some response patterns seem logical (e.g.,
RCODE=1 Format Error under 1035 s4.1.1), in response to 0x50FA
option coded queries. Some are merely helpful (e.g., still
answering test option coded queries, even after there's an IANA
assigned field).
Other behaviors seem helpful for very early testing, but are
perhaps not a useful status quo and might be discouraged, e.g.,
returning 0x0008 in response to queries with option 0x50FA, since
this raises anti-poisoning questions at the recursive. (Is query
tuple matching at the recursive to additionally include the option
code? If not, that doubles the probability of success for
attack.)
Perhaps if there are authority implementors on list, they can
clarify the thinking here? (I'd be particularly interested in
those zones who formerly answered 0x50FA, and now issue FormError
or similar responses. That change denotes some re-evaluation, or
maybe a new tool.)
I'll have some stats on this shortly, if there's interest.
2) Probe Delay for Authority Behavior?
I either don't understand or am not convinced by the draft's
discussion of a possible probe delay for testing ECS behavior in
authorities. Here's my current thinking: A naive in-line
implementation of probes would of course incur delay when
iterating to an authority for which a recursive has no cache
evidence of ECS. But surely all recursive implementations have
done other out-of-query-band testing of authorities for ECS
behavior, at least from what I can determine from my logs.
(Indeed, some are still manual.)
Section 12.1 does note the need for periodic probing. I'm not
clear why section 12.2 notes a "possible query loss/delay" for
such probes. I speculate: in the worst case, wouldn't a busy
recursive just provide a stock zone answer, without subnet
localization? I speculate that, in the worst case, the first
query for a novel zone results in this non-localized answer
(sorry; no ECS for novel NS/novel zones; just plain vanilla 1034),
but after the recursive validates ECS awareness (either
out-of-band, or through manual whitelisting), subsequent queries
become subnet aware.
Worst case, if the whitelisting and/or periodic probing
contemplated by S.12.2 were a linear scale of the TTL for the NS
record (or the default for the zone), then even naive, in-line
querying for ECS would be able to limit "loss/delay" to
once-per-TTL expiration. And again, the recursive could avoid
this, by simply not returning an ECS-endowed message, falling back
to stock 1034 instead of failure.
So I'm afraid I do not understand "loss/delay" discussion in the
document. Granted, it's probably there to motivate the need for
whitelisting. But I focus on this, because I'd like to understand
(and hopefully avoid) any language that diminishes the operational
value or potential for adding probe records such as this to any
ECS-aware zone:
_edns-client-subnet.${HOST}.in-addr.arpa IN TXT "v=ecs1 optin"
This is operationally not done, AFAIK. But if it were (and also
only honored in response to 0x0008 typed queries from the
recursive), or in some similar form, it would become evident to
the stubs---the first evidence they'd have a both recursive and
authority treatment of ECS. If there are more complexities in NS
ECS status maintenance, I'd like to better understand them. There
are only two implementors of the protocol, AFAIK, so perhaps
someone can help?
I'm still digesting the rest of the document, and running tests. It's
well written, and helpfully annotated. I'm just a bit slow in this
process.
I will endeavor in the time that remains for this IETF review to
identify more comments about the draft, which documents current
practices.
My general sense, summarized in my earlier post, is that this protocol
is a significant change due to the re-injection of user metadata,
has/will cause user surprise (I use that word descritively, based on
experience), affected proxies/vpns and hidden services, and could be
better detailed in some parts (e.g., no encoding for PTR?, MX?,
discussing FORMERR behavior for 0x50FA type queries, etc.).
But I'm also aware that global recursive operators can point to a
competitive need for mirror localization. In short, "interesting
times".
--
David Dagon
[email protected]
D970 6D9E E500 E877 B1E3 D3F8 5937 48DC 0FDC E717
_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop