Dear authors:
I just finished reading this document. Thank you for your work on it.
Why was it decided to create a bis document and not just an Update to
rfc5306? Either way works for me, I'm just curious.
Knowing that the change in this document, with respect to rfc5306, is new
functionality, I tried very hard to not make comments about the unchanged
text, I really did. However, there are a couple of significant items that
I *have* to comment on (please see in-line)...mostly related to lack of
clarity, or opportunities that we should take to improve. In general, I
think the text could use a bit more clarity, but I won't insist on it to
avoid further changes.
The result of the elimination of ยง2 (Conventions Used in This Document) is
that explanations of what restart/restarting and start/starting mean were
removed. I think those helped frame the expectations of the operation. In
the current document, there is no formal definition of those terms. I
strongly believe that the text should be restored, perhaps at the end of
the Overview.
I will wait until the major items (in-line) are addressed before I start
the IETF Last Call.
Thanks!!
Alvaro.
[Line numbers from idnits.]
....
11 Abstract
....
18 This document additionally describes a mechansim for a router to
19 signal its neighbors that it is preparing to initiate a restart while
20 maintaining forwarding plane state. This allows the neighbors to
21 maintain their adjacencies until the router has restarted, but also
22 allows the neighbors to bring the adjacencies down in the event of
23 other topology changes.
[nit] s/mechansim/mechanism
....
100 1. Overview
....
158 It is assumed that the three-way handshake [RFC5303] is being used on
159 Point-to-Point circuits.
[major] If only assumed, will the mechanism work if 3-way handshake is not
used? Why it is not mandated?
161 2. Approach
163 2.1. Timers
....
171 An instance of the timer T1 is maintained per interface, and
172 indicates the time after which an unacknowledged (re)start attempt
173 will be repeated. A typical value might be 3 seconds.
[minor] s/A typical value might be 3 seconds./A typical value is 3 seconds.
175 An instance of the timer T2 is maintained for each LSP database
176 (LSPDB) present in the system, i.e., for a Level 1/2 system, there
177 will be an instance of the timer T2 for Level 1 and an instance for
178 Level 2. This is the maximum time that the system will wait for
179 LSPDB synchronization. A typical value might be 60 seconds.
[minor] s/A typical value might be 60 seconds./A typical value is 60
seconds.
181 A single instance of the timer T3 is maintained for the entire
182 system. It indicates the time after which the router will declare
183 that it has failed to achieve database synchronization (by setting
184 the overload bit in its own LSP). This is initialized to 65535
185 seconds, but is set to the minimum of the remaining times of received
186 IS-IS Hellos (IIHs) containing a restart TLV with the Restart
187 Acknowledgement (RA) set and an indication that the neighbor has an
188 adjacency in the "UP" state to the restarting router.
[nit] I think a comma would help in differentiating the two things in the
last sentence: s/and/, and
[major] Maybe it's just me, but I don't know what the "the remaining times
of...an indication that the neighbor has an adjacency in the "UP" state to
the restarting router" is. Please clarify.
190 NOTE: The timer T3 is only used by a restarting router.
192 2.2. Restart TLV
....
200 Type 211
202 Length: Number of octets in the Value field (1 to (3 + ID Length))
203 Value
[minor] s/Length:...(1 to (3 + ID Length))/Length:...(1 or (3 + ID Length))
[nit] Add a line before "Value".
205 No. of octets
206 +-----------------------+
207 | Flags | 1
208 +-----------------------+
209 | Remaining Time | 2
210 +-----------------------+
211 | Restarting Neighbor ID| ID Length
212 +-----------------------+
214 Flags (1 octet)
216 0 1 2 3 4 5 6 7
217 +--+--+--+--+--+--+--+--+
218 |Reserved|PA|PR|SA|RA|RR|
219 +--+--+--+--+--+--+--+--+
[minor] Do you intend for IANA to assign the remaining bits?
221 RR - Restart Request
222 RA - Restart Acknowledgement
223 SA - Suppress adjacency advertisement
224 PR - Restart is planned
225 PA - Planned restart acknowledgement
[major] The functionality in this document (old and new) depends on a
single bit being set. What should a receiver do if more than one bit is
set? Are there valid combinations of multiple bits?
227 (Note: Remaining fields are )
[nit] Leftover text...
....
245 Note: Implementations based on earlier drafts of RFC 5306
246 may not include this field in the TLV when the RA bit is set.
247 In this case, a router that is expecting an RA on a LAN circuit
248 SHOULD assume that the acknowledgement is directed at the local
249 system.
[major] I think it is time we stop catering to old/non-compliant
implementations. The last time that a draft didn't include the System ID
was draft-ietf-isis-restart-03 in March 2003: more than 15 years ago!!
....
360 2.2.3. Use of PR and PA Bits
....
376 a. if this is the first IIH with the PR bit set that this system has
377 received associated with this adjacency, then the adjacency is
378 marked as being in "Planned Restart state" and the adjacency
379 holding time is refreshed -- otherwise, the holding time is not
380 refreshed. The holding time SHOULD be set to the "remaining
381 time" specified in the received IIH with PR set. The "remaining
382 time" transmitted according to (b) below MUST reflect the actual
383 time after which the adjacency will now expire. Receipt of a
384 normal IIH with the PR bit reset will clear the "Planned Restart
385 mode" state and cause the receiving router to set the adjacency
386 hold time to the locally configured value. This procedure allows
387 the router planning a restart to cause the neighbor to maintain
388 the adjacency long enough for restart to successfully complete.
389 Whether or not an adjacency is marked as being in "Planned
390 Restart mode" has no effect on adjacency state transitions.
[minor] Is it "Planned Restart state" or "Planned Restart mode" state?
392 b. immediately (i.e., without waiting for any currently running
393 timer interval to expire, but with a small random delay of a few
394 tens of milliseconds on LANs to avoid "storms") transmit over the
395 corresponding interface an IIH including the restart TLV with the
396 PR bit clear and the PA bit set. The "Remaining Time" MUST be
397 set to the current time (in seconds) before the holding timer on
398 this adjacency is due to expire. If the corresponding interface
399 is a LAN interface, then the Restarting Neighbor System ID SHOULD
400 be set to the System ID of the router from which the IIH with the
401 PR bit set was received. This is required to correctly associate
402 the acknowledgement and holding time in the case where multiple
403 systems on a LAN are planning a restart at approximately the same
404 time.
[major] "System ID SHOULD be set to the System ID of the router from which
the...PR bit set was received. This is required to correctly associate the
acknowledgement...where multiple systems on a LAN are planning a restart at
approximately the same time." IOW, if the System ID is not set correctly,
then the sender of the IIH+PR won't necessarily know the reply is to it...
When is it ok to not set the System ID that way? IOW, why is MUST not
used? [BTW, I know this is the same text as in 2.2.1...this comment
applies there too.]
....
428 While the adjacency is in planned restart state the following actions
429 MAY be taken:
[major] The steps below are optional (MAY)...that is ok. (a) talks about a
compromised ability to flood updates on a LAN is adjacencies are not reset,
which sound very serious to me. What are the considerations for an
implementation/deployment to decide using these optional actions? Should
they be used on LANs, but not necessarily in other link types (just as an
example)?
[minor] The actions below are independent of each other, right? It may be
worth mentioning that.
431 a. If additional topology changes occur, the adjacency which is in
432 planned restart state MAY be brought down even though the hold
433 time has not yet expired. Given that the neighbor which has
434 signaled a planned restart is not expected to update its
435 forwarding plane in response to signaling of the topology changes
436 (since it is restarting) traffic which transits that node is at
437 risk of being improperly forwarded. On a LAN circuit, if the
438 router in planned restart state is the DIS at any supported
439 level, the adjacency(ies) SHOULD be brought down whenever any LSP
440 update is either generated or received so as to trigger a new DIS
441 election. Failure to do so will compromise the reliability of
442 the Update Process on that circuit. What other criteria are used
443 to determine what topology changes will trigger bringing the
444 adjacency down is a local implementation decision.
[nit] s/received so as to/received, so as to
[major] "adjacency(ies) SHOULD be brought down...Failure to do so will
compromise the reliability of the Update Process on that circuit." When is
it ok to not bring these adjacencies down? If the operator already decided
to deploy these optional steps, why wouldn't the adjacencies be always
brought down? IOW, why is MUST not used?
446 b. If a BFD session to the neighbor which signals a planned restart
447 is in the UP state and subsequently goes DOWN, the event MAY be
448 ignored since it is possible this is an expected side effect of
449 the restart. Use of the Control Plane Independent state as
450 signalled in BFD control packets [RFC5880] SHOULD be considered
451 in the decision to ignore a BFD Session DOWN event
[nit] Move the reference to rfc5880 to the first mention of BFD.
[nit] s/Session DOWN event/Session DOWN event.
....
467 2.3.1. Adjacency Reacquisition during Restart
469 The restarting router explicitly notifies its neighbor that the
470 adjacency is being reacquired, and hence that it SHOULD NOT
471 reinitialize the adjacency. This is achieved by setting the RR bit
472 in the restart TLV. When the neighbor of a restarting router
473 receives an IIH with the restart TLV having the RR bit set, if there
474 exists on this interface an adjacency in state "UP" with the same
475 System ID, and in the case of a LAN circuit, with the same source LAN
476 address, then the procedures described in Section 3.2.1 are followed.
[minor] s/3.2.1/2.2.1
....
492 On a LAN circuit, the LAN-ID assigned to the circuit SHOULD be the
493 same as that used prior to the restart. In particular, for any
494 circuits for which the restarting router was previously DIS, the use
495 of a different LAN-ID would necessitate the generation of a new set
496 of pseudonode LSPs, and corresponding changes in all the LSPs
497 referencing them from other routers on the LAN. By preserving the
498 LAN-ID across the restart, this churn can be prevented. To enable a
499 restarting router to learn the LAN-ID used prior to restart, the LAN-
500 ID specified in an IIH with RR set MUST be ignored.
[major] "the LAN-ID assigned to the circuit SHOULD be the same as that used
prior to the restart...churn can be prevented." Are there cases when
(without knowledge of other changes) the LAN-ID would be better off being
something else? IOW, why would a MUST not fit here?
....
516 On a Point-to-Point link, receipt of an IIH not containing the
517 restart TLV is also treated as an acknowledgement, since it indicates
518 that the neighbor is not restart capable. However, since no CSNP is
519 guaranteed to be received over this interface, the timer T1 is
520 cancelled immediately without waiting for a complete set of CSNPs.
521 Synchronization may therefore be deemed complete even though there
522 are some LSPs which are held (only) by this neighbor (see
523 Section 3.4). In this case, we also want to be certain that the
524 neighbor will reinitialize the adjacency in order to guarantee that
525 the SRMflags have been set on its database, thus ensuring eventual
526 LSPDB synchronization. This is guaranteed to happen except in the
527 case where the Adjacency Three-Way State in the received IIH is "UP"
528 and the Neighbor Extended Local Circuit ID matches the extended local
529 circuit ID assigned by the restarting router. In this case, the
530 restarting router MUST force the adjacency to reinitialize by setting
531 the local Adjacency Three-Way State to "DOWN" and sending a normal
532 IIH.
[minor] s/3.4/2.4
....
567 If an interface is active, but does not have any neighboring router
568 reachable over that interface, the timer T1 would never be cancelled,
569 and according to Section 3.4.1.1, the SPF would never be run.
570 Therefore, timer T1 is cancelled after some predetermined number of
571 expirations (which MAY be 1).
[minor] s/3.4.1.1/2.4.1.1
573 2.3.2. Adjacency Acquisition during Start
....
592 Upon receipt of an IIH with the SA bit set, the behavior described in
593 Section 3.2.2 is followed.
[minor] s/3.2.2/2.2.2
....
642 When the T2 timer(s) are cancelled or expire, transmission of
643 "normal" IIHs (with RR, RA, and SA bits clear) will begin.
[minor] A "normal" IIH is mentioned before, but this is the first time that
it is described ("with RR, RA, and SA bits clear"). Shouldn't this
definition be expanded to include the PR/PA bits? Also, it would be nice
if the term was defined earlier in the text: move the parenthesis to the
first mention.
....
659 2.4. Database Synchronization
....
670 When (re)starting, a router starts an instance of timer T2 for each
671 LSPDB as described in Section 3.3.1 or Section 3.3.2. In addition to
672 normal processing of the CSNPs, the set of LSPIDs contained in the
673 first complete set of CSNPs received over each interface is recorded,
674 together with their remaining lifetime. In the case of a LAN
675 interface, a complete set of CSNPs MUST consist of CSNPs received
676 from neighbors that are not restarting. If there are multiple
677 interfaces on the (re)starting router, the recorded set of LSPIDs is
678 the union of those received over each interface. LSPs with a
679 remaining lifetime of zero are NOT so recorded.
[minor] s/Section 3.3.1 or Section 3.3.2/Section 2.3.1 or Section 2.3.2
....
710 2.4.1.1. Restarting
....
766 Once the other level's SPF has run and any inter-level propagation
767 has been resolved, the router's own LSPs can be generated and
768 flooded. Any own LSPs that were previously ignored, but that are not
769 part of the current set of own LSPs (including pseudonodes), MUST
770 then be purged. Note that it is possible that a Designated Router
771 change may have taken place, and consequently the router SHOULD purge
772 those pseudonode LSPs that it previously owned, but that are now no
773 longer part of its set of pseudonode LSPs.
[major] This is a Normative conflict: "...(including pseudonodes), MUST
then be purged...the router SHOULD purge those pseudonode LSPs" How is it
possible to always (MUST) purge the LSPs, but not always (SHOULD)? Perhaps
s/SHOULD/MUST
....
793 2.4.1.2. Starting
....
815 To avoid the possible formation of temporary blackholes, the starting
816 router sets the SA bit in the restart TLV (as described in
817 Section 3.3.2) in all IIHs that it sends.
[minor] s/3.3.2/2.3.2
....
967 4. IANA Considerations
969 This document defines the following IS-IS TLV that is listed in the
970 IS-IS TLV codepoint registry:
972 Type Description IIH LSP SNP
973 ---- ----------------------------------- --- --- ---
974 211 Restart TLV y n n
[major] You're missing the Purge column.
[major] This section should also ask IANA to change the reference from
rfc5306 to this document.
976 5. Security Considerations
....
1000 If an SA bit is set in a false IIH, this could cause suppression of
1001 the advertisement of an IS neighbor, which could either continue for
1002 an indefinite period or occur intermittently with the result being a
1003 possible loss of reachability to some destinations in the network
1004 and/or increased frequency of LSP flooding and SPF calculation.
[major] Please add similar considerations as above for the new bits.
_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr