[DNSOP] WGLC Updates to draft-ietf-dnsop-session-signal

Ted Lemon Thu, 08 Mar 2018 21:21:03 -0800

We have updated draft-ietf-dnsop-session-signal to address the points raised 
during the working group last call, which ended on 2/23.  We've already 
received comments on the new version from Paul Hoffman (thanks for that) and 
will address those as well in a later update.


I've included links to the messages we received during last call and a log of 
the changes we made based on the points raised in those messages below.   
Thanks to everyone for the thorough reviews—we got lots of helpful comments!

---

Jan Komissar: 
https://mailarchive.ietf.org/arch/msg/dnssd/v48k8QbOSN-HtDRFRf20csoq1ZY

1: Should we require TLS?
 
We entertained this idea, and it would simplify the protocol work, but it seems 
like an overreach for this document.   If we require TLS, we are effectively 
making that a requirement for any DNS-over-TCP connection for which session 
signaling would be useful.   We do not believe that we have working group 
consensus on this point, and therefore can't make this move in this document.   
If the working group wants to require TLS for DNS TCP connections, that should 
be done in a separate document.

Ed Lewis
 https://mailarchive.ietf.org/arch/msg/dnsop/CQ_qRzpm_omXivdY7eELQq-fX5I
 https://mailarchive.ietf.org/arch/msg/dnsop/Iig2nnIvBFsjG5FafKXgpdtBDTw

2: Is the term "session" inappropriate here?

The consensus appears to be that using a different term in order to accommodate 
the OSI layering model, which isn't actually in use, seems unnecessary.   We 
will add text to the document to explicitly state that this isn't session in 
the OSI sense.  Numerous other comments were made that asked for clarification 
on the basis of the assumption that we were talking about an OSI-model session; 
these are all addressed by clarifying that we are not.

3: The endpoints - is that the stub resolver and authoritative server (on in  
DNSSEC terms) the signer and the validator?

The text here explicitly refers to a connection between client and server; to 
clarify it to "stub resolver and authoritative server" would not be correct, 
since there are many other "clients" and "servers."   For example a 
full-service cache is a "server" to the stub resolver, and a "client" to an 
authoritative server or upstream forwarder.  RFC 7766 currently uses the terms 
"client" and "server" in the same way, and it's not clear how we could further 
clarify this.

4: This section (4.1) intermingles text on whether or not each DSO request 
elicits a response or not and the process of DSO "session" establishment.  With 
proper editing, these should be separated to lessen confusion.

This has been clarified

5: (Section 4.1.1) Requirements (first MUST) and recommendations to operate in 
a certain way tend to become dated quickly.  Instead of placing requirements on 
clients to act good, let the server refuse workload.  I am thinking of the 
issue surrounding the iterations in NSEC3 and recent surveys of operators. 
Despite the documents saying a low value is better, operators use high values.

This text is quote from RFC 7766, so updating it would require an update to 
RFC7766.

6: This phrase is confusing and unnecessary "this is a fatal error".  The logic 
to that point is clear that the situation doesn't happen and the  prescribed 
behavior ("close the connection") makes sense.

The term "fatal error" is used throughout the document to mean an error that 
precludes further communication on the connection.   The terminology section 
has been updated to clarify.   

7: This is clumsy, when describing the RCODE: "generally set to zero on 
transmission, and silently ignored on reception, except".

This text has been clarified.

8: Perhaps these are not "unacknowledged requests" but "subsequent responses".

We had a debate about what the right phrase was to use and settled on 
"unacknowledged messages."   The document has been updated to use this phrase 
consistently, which should lessen the confusion.   "Subsequent responses" 
doesn't really work because an unacknowledged message isn't really a response.

9: Do not do name compression!  No No No No.

The reason we don't say this is that TLVs can tunnel DNS messages, and this 
text would require us to uncompress the message before encoding it in a TLV.

We have changed the text to be slightly more generic: the content of a TLV is 
opaque and subject to definition of that TLV.   We no longer mention DNS name 
compression at all—since TLVs are opaque blobs, no requirements can be placed 
on what they look like.

10: Unrecognized primary: "I would have thought this would warrant tearing down 
the connection given the words earlier that this ought never happen."

We agree, and the document has been updated accordingly.

11:   "The namespaces of 16-bit MESSAGE IDs are disjoint in each direction.   
For example, it is *not* an error for both client and server to send a request 
message with the same ID."   This will, someday, confuse a young and 
inexperienced DNS hosting engineer.

The reason we wrote this in the spec is so that a young engineer won't make 
that mistake.

We have added some text to clarify that the directionality of the message can 
be considered an extra bit.   There is no way for the server and client to know 
that the other isn't using the same message ID as they are composing the 
message, and the tracking required to prevent clashes that theoretically could 
be prevented would impose a huge amount of complexity—this really isn't at all 
analogous to the key_id example.

12: There will be a need to fight cruft, or garbage collect.  Inactive objects 
tend to be forgotten while still using up resources.

If you've got buggy client software that forgets to cancel some operation it's 
going to forget to stop sending renewals as well.  This is one reason for 
having keepalives.

13: I thought that if a connection ends, the DSO session ends.

Correct.

We have changed "reconnect" to "make a new session" which means a new 
connection as well.

14: Sometimes a client can't distinguish this: "If reconnecting to the same 
server," as some server processes have multiple addresses and names."

We have stolen the following text from DNS Push and flipped it around to change 
the emphasis:

DNS Updates and DNS Push Notifications may be handled on different ports on the 
same target host, in which case they are not considered to be the "same server" 
for the purposes of this specification, and communications with these two ports 
are handled independently.

The issue is that it's actually a hard problem for the client to know that two 
servers are the same server.   This definition is sufficiently constrained that 
we can imagine a client successfully implementing.   Matching IP addresses is 
too hard.   As a server operator, if you want servers to share connections it's 
your job to set up the server with the same host and ports.   The opposite is 
also true.

15: "The RECOMMENDED value is 10 seconds." Probably a bad idea to codify this 
because implementations will set it to 10 and not scatter it when it should be. 
 (Like closing out many connections in a load-shedding panic.)

Instead could say that it's recommended that servers set this value to ten 
seconds when they want the client to reconnect, and refer back to the session 
that has the more general text about this (5.6.1.1).

The text has been updated as suggested—thanks!

Paul Hoffman:
 https://mailarchive.ietf.org/arch/msg/dnsop/PUlnpSfGvZo6cZge476SEWUR7tM

16: The document should explicitly list which protocols are currently 
acceptable, and say that the list can change in the future based on 
standards-track documents.

We do not agree that this should be restricted to standards-track documents.   
That said, we took the text that Paul kindly provided and tweaked it.  Updates 
to DSO obviously still have to be standards-track.

Paul also sent us comments on the post-WGLC -06 update, which have not yet been 
addressed as of now.  Thanks very much for the quick response, Paul!

Stephane Bortzmeyer:
 https://mailarchive.ietf.org/arch/msg/dnssd/NFtd5RGIJhbOnb4Q_dWLwfZmkYg

17: My personal feeling is that it is complicated, with a lot of details. May 
be separating in two documents, one for the base DSO concept and one for the 
standards TLV (with their detailed behavior) would have been better.

We don't see any working group consensus on this point, and it seems 
unnecessary.

18: There is a discussion about a possible Privacy section 
<https://github.com/raybellis/draft-bellis-dnsop-session-signal/pull/36> for 
which I suggest the following text: [...]

We started with Stephane's text, and wound up with this:

   When designing new DSO TLVs, the potential for data in the TLV to be
   used as a tracking identifier should be taken into consideration, and
   should be avoided when not required.

   When used without TLS or similar cryptographic protection, a
   malicious entity maybe able to inject a malicious Retry Delay
   Unacknowledged Message into the data stream, specifying an
   unreasonably large RETRY DELAY, causing a denial-of-service attack
   against the client.

This is now in the Security Considerations section, rather than in a separate 
Privacy Considerations section.   We could make it a separate section if there 
is a strong desire to do so, but it seemed unnecessary.

19:  "There are a myriad of other potential use cases for DSO given the 
versatility and extensibility of this specification."   I don't really like 
this sort of sentence. Either we have ideas about these potential use cases and 
we should write them down, or we don't and we should avoid this sort of very 
general words (after all, human imagination being what it is, we can be sure 
surprising use cases will be found.)

We agree and have removed this text.

20:  "If the RCODE is set to any value other than NOERROR (0) or DSONOTIMP  
(tentatively 11), then the client should assume that the server does  not 
support DSO."  (Why "should" in lower case?) RFC 1035 being very clear that the 
rcode from a non-DSO server must be NOTIMP (this is also said in section 4.2.1 
of the draft), I suggest to change that to: [...]

We actually think it should be MUST.

We believe the text otherwise says essentially the same thing as Stephane's 
text, except that Stephane's text actually places new requirements on 
non-conforming servers.   We therefore prefer the current text.

21: It seems to me that the draft uses "response-requiring messages" as a 
synonym of "acknowledged request messages" (and "non-response-requiring 
messages" as a synonym of "unacknowledged request messages"). If I'm correct, 
it would be better to state it clearly in the terminology section.

There's a linquistic ambiguity here.  We wrote "response-requiring" rather than 
"acknowledged" because "acknowledged" could mean "has been acknowledged" or 
"requires acknowledgment," whereas "response-requiring" is unambiguous.

So we prefer to keep the text as is.

22: Since there are only four combinations, I do not find this table useful.

The idea is that this table is a template for future modifications, so we want 
to have it there as a way of priming that.

23: Last, RFC 5226 (IANA considerations section) is now replaced by RFC 8126

Fixed, thanks!

Bernie Volz
 https://mailarchive.ietf.org/arch/msg/dnsop/56NcwNBSY1K26tD424Z1sJtQqWU

24: Section 1: Not sure why Stateful is capitalized in the following line: 
"transport protocol.  Each Stateful operation is communicated in its"

Stuart coincidentally had already changed this text to say "each DSO operation" 
rather than "each stateful operation," so problem solved. :)

25: Section 2: You probably should update this to the text in 
https://tools.ietf.org/html/rfc8174 and update the references accordingly.

The text now references both RFC 2119 and RFC 8174.

26: Section 3: Do you want to add anything about Section 6 (and perhaps later 
sections)?

We concluded that this paragraph was actually unnecessary, so rather than 
adding a mention of section 6, we deleted the paragraph!

27: Section 4.2.2.3 (and perhaps other places similar text exists): When a 
connection is aborted because of an invalid message, is there any 
recommendation to be added about retrying? If the client terminates the 
connection, it would likely not be wise for it to immediately retry and repeat 
the operation as that can lead to an endless loop? Should some recommended 
backoff technique be provided? Or some other connection rate limiting warning?

Good point, client should not reconnect immediately after a fatal error.   
There is no good answer here.

We have added some text in the terminology section that talks about the 
implications of fatal errors.   A fatal error means you are not following the 
spec.   It is still possible that a non-DSO connection would work.   How the 
client navigates this is beyond our ability to specify.   The new text says 
that the client shouldn't try to reconnect to the same server instance for at 
least an hour

28: Section 4.2.3, last paragraph: What should happen if a EDNS(0) TCP 
Keepalive option does appear? Should the connection be terminated? The message 
ignored?

This has already been specified.   Document already says ARCOUNT must be zero 
for DSO messages.   We have added text saying that if an EDNS(0) keepalive 
option appears within a DSO session, it's a fatal error and is treated 
accordingly.

29: Section 4.3: first paragraph: This text kind of conflicts with the text in 
4.2.1 which says that whether a message is acknowledged or unacknowledged is 
determined only by the specification for the Primary TLV. I understand this may 
not be worth addressing, but perhaps a reference to 4.2.1 is worth considering?

We have updated and significantly clarified this section, and we believe that 
this point has been addressed.

30: Section 6.1: INACTIVITY TIMEOUT and KEEPALIVE INTERVAL. Is mention 
0xffffffff (infinite) worth adding to this text?

We have added a separate paragraph about 0xfffffffe in section 5.4.2.  Would 
probably be redundant to add it here as well.   We also deleted sections 7.3 
and 7.4, which were redundant.

31: Section 6.2.1: If a client were to receive a new RCODE but does not 
understand it (older version), should there be a statement as to how the client 
should react? Should it treat the unknown error code as if NOERROR were sent?

We have updated this to say that all nonzero errors should be taken as if 
NOERROR were sent and treat them as routine shutdown.

32: Section 9: Seems a bit light, but OK if it ends up being acceptable. For 
example, while it probably means you have bigger problems, but large timer 
values (such as in the Retry Delay TLV) could be a denial of service vector. 
Though if the server does that, it probably isn’t who you wanted to be talking 
to anyway and you should have used TLS.   Perhaps also saying that if DNS over 
TLS isn’t used (just plan TCP), then it may be possible for a man-in-the-middle 
to inject messages (such as with a large Retry Delay TLV)?

We do not see an action here that doesn't already apply to TCP and TLS.

33: While there seems not be a solid definition of what the “Updates” means in 
the RFC header, it always seems to me that “Updates” means you are changing 
something in those documents, rather than just extending them with new 
capabilities within the original documents framework.

The point of [updates] here is to flag to readers of the earlier specifications 
that they should read this specification as well.   There has been some 
agreement in the working group that this should be here.  In a future world 
where DSO is pretty common, then we have changed RFC 7766, and it would be 
problematic for an implementer to not implement this specification as well.

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] WGLC Updates to draft-ietf-dnsop-session-signal

Reply via email to