Re: [Lsr] A review of draft-ietf-lsr-isis-ttz

Acee Lindem (acee) Mon, 15 Feb 2021 09:43:03 -0800

Hi Adrian,
Thanks Much - I think these are all good comments and would greatly improve 
both the completeness and readability of the draft.


Hi Authors, 

This is really a good review and I believe all the comments should be 
incorporated. While many of these are sins of omission that will take some 
time, it would be good to at least respond as to your intent to cover in future 
versions of the draft. 

Thanks,
Acee

On 2/13/21, 3:35 PM, "Lsr on behalf of Adrian Farrel" <[email protected] on 
behalf of [email protected]> wrote:

    Hi all,

    Acee leant on me to do a review of this work (so blame him :-)

    It's good to see this document adopted and progressing. Particularly
    good to see the realistic compromise of making this Experimental.

    I have a few comments, below.

    Best,
    Adrian

    ===

    I have a largish issue with the fact that the document offers a choice
    of how to aggregate the zone: virtual node or full mesh. Firstly, it is
    not helpful to offer options without guidance about which option to pick
    if you're an implementer or a deployer. You also need to specify whether 
    the choice MUST be a configuration option, and how to handle when some 
    nodes in the zone think one option and the others think the other 
    option.

    Possibly you can make this part of the experiment (see below for notes
    on the experiment).

    I have some pretty strong opinions on the idea of a single node
    abstraction. The main challenge comes when there is a partial failure in
    the zone such that the zone is partitioned (or the path between two
    zone neighbors across the zone is severely degraded). It is not possible
    to represent this in the node model since your only options are:
    - drop the connection to a neighbor
    - move to represent the zone as two nodes

    In fact, both models (node and mesh) are subject to disruption when
    there is a connectivity failure within the zone, but if we think about
    the mesh model, it doesn't actually need to be advertised as a full
    mesh: partial mesh is easily handled. Nevertheless, the use of a single
    zone leader to perform the aggregation has problems if the zone is 
    partitioned in some way - perhaps this is addressed by the partitioned
    zone simply electing two distinct leaders and declaring itself as two
    zones.

    This discussion of faults within the zone seems (to me) to be pretty
    important.

    I am also struggling with metrics and route computation when the zone is
    viewed from outside the zone.  4.1.5 tells us about route computation, 
    but it is not until 4.3.1 that we discover:
       The
       metric to the neighbor is the metric of the shortest path to the edge
       node within the zone.
    This text applies to the full mesh case, and we don't have anything 
    about the node model, so we might assume that the metrics on the edge
    circuits are unchanged. 

    Obviously, this is important, and it feels that something is broken for
    the virtual node case. Consider Figure 1.

    Without the zone (and assuming link metrics of 1), the cost of the path
    R15-R61-R71-R67-R31 is 4, and this route might not be preferred if some
    other route R15-x-y-R31 exists with cost 3. However, once we have 
    introduced the zone using the virtual node approach, there is an 
    available route R15-Rz-R31 that appears to have a preferable metric of
    2. I would say that the route R15-x-y-R31 should still be preferred.

    This point certainly needs to be called out in the text, and maybe this
    gives some input to the choice between models. Perhaps the metrics in
    the ISN and ESN TLVs are related to this point, but section 4.2.1 gives
    no hint about how to set these values. Actually, I suspect that what is
    going on here is that all of the metrics advertised to outside the zone 
    are controlled by the zone leader and advertised in the ISN/ESN - but I
    don't find that actually stated anywhere.

    All this said, I find it notable that this document focusses almost 
    completely (sections 4 and 5 - section 4.3 is a very small section) on
    the virtual node model. It would be good to provide an example like 
    Figure 2, but for the mesh model.

    Perhaps rather than deferring this to be an outcome of the experiment,
    this document should spend some time comparing the two models *or* it
    might even be time to abandon one of the models. 

    ---

    Obviously, at some point before this goes forward for publication,
    you'll need to reduce to no more than five front-page authors.

    ---

    I think the Abstract might usefully mention IS-IS. Probably the first
    sentence could read:

       This document specifies a topology-transparent zone in an IS-IS area.

    ---

    The document really needs a section to scope the Experiment.

    - How is the experiment kept separate and safe from the Internet or
      indeed from any non-participating routers?
    - What happens if the boundary of the experiment are breached?
      (To expand on this, what happens if there is a misconfiguration so
       that a Zone Internal Node thinks its neighbor is also in the Zone
       when it is actually unaware of these extensions and should be
       treated as a Zone External Node? This misconfiguration has a node
       that should be a Zone Edge/Border Node acting as a Zone Internal
       Node.)
    - How is the success (or failure!) of the experiment assessed?
    - Are there plans to bring this back for consideration on the standards
      track if certain criteria are satisfied?
    - Is evaluation of the relative merits of node and mesh abstraction part
      of the experiment?

    ---

    Section 1

    The WG may have established a different practice, but it used to be
    normal to reference RFC 1195 alongside ISO 10589.  (You do have 1195
    listed in the references section, but you don't actually reference it).

    ---

    Section 1

       There are scalability issues in using areas as the number
       of routers in a network becomes larger and larger.

    Maybe what you're trying to say in this section (and it is important
    because it gives the whole motivation for this work) is that there are
    scalability issues with a single IS-IS area as the number of routers in
    the area grows. (You might explain what those issues are.)

    Then you can go on to say how splitting into multiple levels and having
    multiple L1 areas mitigates the scaling issues. And then you can 
    continue with your text about why splitting an IS-IS system as it grows
    can be hard.

    ---

    Section 2

       A Topology-Transparent Zone (TTZ) may be deployed to resolve some
       critical issues such as scalability in existing networks and future
       networks.

    This sounds like you have a number of critical issues in mind, but you
    only mention scalability. Are there others you can list, or should you
    reduce this text to just...

       A Topology-Transparent Zone (TTZ) may be deployed to resolve the
       critical issue of scalability in existing network and future
       networks.

    ---

    Section 2

       o  Abstracting a zone as a TTZ virtual entity, which is a single
          virtual node or zone edges' mesh, SHOULD be smooth with minimum
          service interruption.

    I *think* you are talking about the transition from not using TTZ to
    using TTZ, but it could be a lot clearer.

    A forward pointer to 4.1.4 might be useful. And 4.1.4 really should
    describe some of the processing governed by the OPS bits in 4.2.1.

    ---

    Section 2

       o  De-abstracting (or say rolling back) a TTZ virtual entity to a
          zone SHOULD be smooth with minimum service interruption.

    This is similarly unclear, and it sounds like you might be talking 
    about turning off a zone (i.e., moving all of the Zone Nodes into the
    surrounding area and removing the zone), or you could be talking about
    moving a single node from inside to outside the zone.

    ---

    Section 2

       o  Users SHOULD be able to easily set up an end-to-end service
          crossing TTZs.

    I am not clear what a "service" is in this context. Assuming we're not
    talking about TE extensions, isn't the service simply that the user 
    sends packets and they are routed by the network?

    ---

    Section 4

    I think the start of this section needs to add a little about the limits
    of a TTZ. In particular:
    - Is a TTZ restricted to reside within a single level?
    - Is a TTZ restricted to lie within a single area?
    - What happens if one of the zone nodes is an L1/L2 router?
      - Presumably, depending on the answer to the first question, this 
        could only happen if the node in question is a zone edge/border node
        But, even then it is complicated: does the abstracted node become an
        L1/L2 router?

    ---

    4.1
    OLD
      Each of these links connects a zone neighbor.
    NEW
      Each of these links connects to a zone neighbor.
    END

    ---

    4.1
       The virtual node ID may be derived from the zone ID.

    Maybe say how else it could be specified and how the implementer or
    deployer makes this choice.

    ---

    A useful modification to Figures 1 and 2 would be to add a circuit from
    R15 to R65 in Figure 1 and show how this becomes a second 'parallel'
    circuit from R15 to Rz in Figure 2.

    ---

    4.1.1

       A TTZ MUST hide the information inside the TTZ from the outside.  It
       MUST NOT directly distribute any internal information about the TTZ
       to a router outside of the TTZ.

       For instance, the TTZ in the figure above MUST NOT send the
       information about TTZ internal router R71 to any router outside of
       the TTZ in the routing domain; it MUST NOT send the information about
       the circuit between TTZ router R61 and R65 to any router outside of
       the TTZ.

    These "for instance" examples are good in that they are true. But they
    imply some things by omission, and I don't think you mean to make those
    implications.

    That is, the first paragraph is much clearer and definitive. But your
    second paragraph, by calling out some special cases of "internal 
    information" makes it ambiguous whether, for example, the router R61 is
    advertised outside the TTZ. (Of course, it isn't.)

    It may be better to delete the second paragraph, and go straight to the
    following paragraph that describes what is seen outside the TTZ by
    directly describing what *is* advertised rather than providing a partial
    list of what is not advertised.

    ---

    I think that the subsections of 4.1 cover all of the necessary 
    information. My list of things to cover is:
    - zone edge/border nodes form adjacencies with zone neighbor nodes using
      the identity of the aggregate zone node and not their own identities
    - zone nodes continue to operate IS-IS as normal to advertise zone nodes
      and zone links within the zone
    - zone edge/border nodes do not advertise or readvertise LSPs that
      originated within the zone to neighbors outside the zone
    - zone nodes continue to operate IS-IS as normal to re-advertise LSP 
      that originated outside the zone
    - the zone leader is responsible for deriving the aggregate node 
      information that represents the node and for originating LSPs for this
      aggregate node
    - zone nodes re-advertise LSPs originated by the zone leader on behalf
      of the aggregate zone node on all circuits including those that 
      connect to zone neighbor nodes
    - when a zone edge/border node readvertises the LSPs for the aggregate
      zone node, it does so as it had originated the LSP
    - when any zone edge/border node receives an LSP that reports itself as
      originating from the aggregate zone node, the edge/border node 
      suppresses the LSP
    - zone nodes do not install routing state resulting from advertisements 
      of LSPs describing the aggregate zone node

    As I say, I think you have all this in the subsections of 4.1, but I had
    to hunt around to find all of this. It might be helpful to give a clear
    summary of the behaviors.

    ---

    4.1.2

       The leader election mechanism described in
       [I-D.ietf-lsr-dynamic-flooding] may be used to elect the leader for
       the zone.

    "may be used" or "are used"?

    ---

    4.1.2

       Somewhere you need to cover what happens if the zone leader fails
       but the zone remains otherwise fully connected. Does the new leader
       start from scratch, or does it try to retain the zone ID etc.?

    ---

    4.1.4 attempts to do two things:
    - describe the migration from not-a-zone to the use of a zone
    - describe the steady state zone behavior
    I think it would be helpful to split these out into separate sections.
    In particular, the migration from not-a-zone to zone is only needed in
    operational networks.

    ---

    4.2

       The following TLV is defined in IS-IS.

    I think...

       This document defines a new TLV for use in IS-IS as follows.

    ---

    4.2.1

       The format of IS-IS Zone ID TLV is illustrated below.  It may be
       added into an LSP for a zone node.

    s/may/MUST/

    ---

    4.2.1

       If every link of a zone edge node is a zone link

    Doesn't that mean that the zone edge node is not a zone edge node?

    ---

    4.2.1

    To be honest, I found the description of the processing governed by
    the OPS bits to be pretty complicated.

    I would recommend adding a new section (related to 4.1.4) that talks
    through the process in clear steps. Then this txt can just list the
    meanings of the bits and point back to the process description.

    Maybe this is what sections 5, 6.2, and 6.3 are for, in which case cut
    down the explanation here and provide forward pointers.

    ---

    Figures 4 and 5. I think you have defined the types for these two 
    sub-TLVs (1 and 2).

    ---

    4.2.1

    I wonder how many neighbors a zone might have. It could be a fairly big
    number, I suspect, although obviously it depends on how the operator
    decides to chop up the area into zones (for which I don't find any 
    guidance).

    The size of the ISN and ESN would appear to be a function of the number
    of neighbors times (IDlength+3). Is there a practical constraint on the
    size of the TLVs which places a limit on the number of neighbors that a
    zone can have? This would be an important design consideration for the
    operator. Maybe it is another feature for experimentation.

    ---

    4.2.1

    The same neighbour may have two links to the zone and not necessarily 
    through the same edge/border node (see my previous point). In this case,
    might the different links have different metrics? I think so, but I
    don't see how that is encoded in the sub-TLVs.

    ---

    6.1

    There is probably something to be said about what happens if the 
    configuration of the zone ID is not consistent across the zone. Is it
    as simple as you ending up with two zones?

    What is the scope of uniqueness of the zone ID? I think it only has to be
    unique in the zone and with the neighbors. Obviously there are ways to
    make this safe (such as area or global uniqueness). What are the
    constraints?

    ---

    6.2

       When receiving
       the command, the node distributes it to every zone node.

    Is this in the management plane or in IS-IS? I can see how it could be
    in IS-IS if the configured node is the zone leader and it just starts
    sending the zone TLV and all of the edge nodes are identified in 
    sub-TLVs such that a receiving node is either an edge or an internal
    node. But I don't see how it works if the configured node is just some
    internal or edge node and the leader has to be elected.

    Similarly...
       If automatic transferring zone to node is enabled, the user does not
       need to issue the command.  A zone node, such as the zone leader,
       will distribute the "command" to every zone node after determining
       that the configuration of the zone has been finished.
    ...what is the command and how is it distributed?

    Same sort of issues in 6.3

    ---

    Section 7 is a bit suspect! What would happen if a zone TLV was sent by
    a compromised router or added to an LSP by a mid-wire attacker? I would
    be sympathetic to you saying that if an attacker can do either of these
    things then there are many far worse things they can do, but I think you
    should call out:
    - what sort of attacks are possible
    - what damage they might do
    - how these attacks might be detected
    - what protections are available (references would be enough)

    ---

    Section 8

       Under the registry name "IS-IS TLV Codepoints", IANA is requested to
       assign a new registry type for Zone ID as follows:

    I think...

       IANA is requested to make a new allocation in the "IS-IS TLV 
       Codepoint Registry" under the registry name "IS-IS TLV Codepoints" 
       as follows:

    ---

    Section 8

    I recommend you tell IANA whether you want the new TLV type to be less
    than or greater than 255.

    ---

    Section 8

       IANA is requested to create a new sub-registry "Adjacent Node ID Sub-
       TLVs" on the IANA IS-IS TLV Codepoints web page as follows:

    I recommend you call the new sub-registry "Sub-TLVs for TLV type TBD1 
    (Zone ID TLV)"

    _______________________________________________
    Lsr mailing list
    [email protected]
    https://www.ietf.org/mailman/listinfo/lsr

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] A review of draft-ietf-lsr-isis-ttz

Reply via email to