Re: [Anima] Rtgdir telechat review of draft-ietf-anima-reference-model-07

Brian E Carpenter Sun, 26 Aug 2018 13:58:24 -0700

(Ccs trimmed)

Christian,


Thanks for this careful review. I'll comment here on the larger issues:

On 2018-08-27 04:03, Christian Hopps wrote:
....
> Minor Major Issues:
> 
> - Virtualization is mentioned once in "4.2 addressing" section. To quote:
> 
>   TEXT: "Support for virtualization: Autonomic Nodes may support Autonomic
>   Service Agents in different virtual machines or containers. The addressing
>   scheme should support this architecture."
> 
>   The special casing of VM/containers here seems to indicate that virtual
>   devices are not "1st class citizens" in an autonomic network. In particular 
> I
>   could easily imagine virtual machines being full blown autonomic nodes
>   themselves. Assuming the intent is not to restrict virtual devices in this
>   manor something needs to be said (somewhere) to make that clear.

I don't think that was the intention. We haven't really explored this in detail,
but I can certainly imagine a deployment (for example) where each tenant in
a data centre has its own virtual autonomic network, and the underlying physical
network is also autonomic. Since the ACP is expected to be implemented as
a VRF, you could even argue that every autonomic network is virtual.

So, yes, we can reword this.

> 
> - Robust programming techniques. I think the intention here is to say that the
>   design of ASAs must have robustness as a top design principle. I think in
>   doing that it should talk about what being robust means; however, it should
>   not be talking about how to accomplish that as there are multiple ways to
>   achieve this goal.
> 
>   In particular I feel saying that restarting is the *last* thing an ASA 
> should
>   do is way overreaching into engineering the solution rather than specifying
>   the requirement. Indeed plenty of people think that overly complex recovery
>   mechanisms that try everything under the sun to *not* restart often have 
> more
>   bugs and are less robust than KISS solutions that "fail" simply but recover
>   quickly with minimal or no disruption.
> 
>   I feel this section reads a bit more like someones idea of how to design a
>   robust system instead of talking about what robust means which is the 
> intent I
>   believe.
> 
>   Perhaps better is just to focus on robust design ideas (some are already
>   stated in the text):
> 
>   - must deal with discovery and negotiation failure as routine.
>   - recovering from failures should be minimally disruptive.
>   - must not leak resources.
>   - must monitor for and deal with hung code.
>   - must include security analysis

OK. Since I drafted that text, I will leave the document editor to fix
it. (Some of the detail probably belongs in another draft specifically
about ASAs, which I am editing.)

> 
> - 7.4: When text talks about feedback loop, it mentions "allow the 
> intervention"
>   of human admin or control system; however, it then describes the feedback 
> loop
>   as presenting default actions and allowing for override. This is fine, but 
> it
>   seems to leave out the common case where something is misbehaving and would
>   not be presenting any choices to the administrator (using the feedback 
> loop),
>   so the admin must forcefully intervene.

Yes. I think the word "feedback" is a bad choice. For engineers raised on
Nyquist diagrams it is part of a closed loop; for other people it means
feedback to humans. The text needs clarifying.

> 
> Minor Issues:
> 
> - 6.1 TEXT: "It must be possible to run ASAs as non-privileged (user space)
>   processes except for those (such as the infrastructure ASAs) that 
> necessarily
>   require kernel privilege. Also, it is highly desirable that ASAs can be
>   dynamically loaded on a running node."
> 
>   ISSUE: Discussing implementation details like user-space, kernel privilege 
> and
>   dynamic loading seems unnecessary and outside the scope of this document. 
> Does
>   this document care if I implement my ASA on a real-time architecture with no
>   "user space" etc..?

Fair enough. See my above comment re robustness.

I'll leave the rest of your comments to the document editor.

Regards
    Brian

> 
> - 4.6 Why call out global routing and overlay networks in particular? Is the
>   real intention to just say that the ACP implementation is not restricted to
>   any specific type of networking?
> 
> - TEXT: 6.3.1.2 "on a given LAN"
> 
>   NIT: Everyone knows what a LAN is; however, I wonder if the text should be
>   more generic and actually describe what it really requires here which is a
>   broadcast or multicast network?
> 
> Questions/Comments:
> 
> - QUESTION: IoT and node requirements. There a couple node ASA requirements. I
>   found myself wondering if a very simple IoT things like thermostats might 
> ever
>   be an AN and if so did they all really need to have joining assistent ASAs? 
> It
>   could be that the answer is "Yes, they do or they can't be nodes". I was 
> just
>   curious.
> 
> - COMMENT: For the types of ASAs: simple (run anywhere), complex (resource
>   restricted), and infra (run everywhere), I was reminded of Kubernetes/cloud
>   orchestration, and the concept of DaemonSets (pods that run everywhere) and
>   Deployments (pods that can run anywhere, possibly be scaled replicated, and
>   may also have requirements that restrict where they can run). I imagine that
>   folks in Anima have also looked at this, but if not it would be good to as
>   they seem to be solving very similar problems.
> 
> Nits:
> 
> - TEXT: 3.2 "However, the information is tracked independently of the status 
> of
>   the peer nodes; specifically, it contains information about non-enrolled
>   nodes, nodes of the same and other domains. "
> 
>   QUESTION: What are peer nodes? Is this another name for adjacent nodes? If 
> so
>   "s/peer/adjacent/".
> 
> - TEXT: 3.3.1 "enrols"
>   CHANGE: "enrolls"
> 
> - TEXT: 3.3.3 "In this state, the autonomic node has at least one ACP channel 
> to
>   another device. It can participate in further autonomic transactions, such 
> as
>   starting autonomic service agents. For example it must now enable the join
>   assistant ASA, to help other devices to join the domain.
> 
>   NIT: "For example foo" is not a sentence on it's own, also "It" is not a 
> good
>   subject as there are multiple nouns in the previous sentence that could 
> serve
>   as antecedents.
> 
>   SUGGEST: 3.3.3 "In this state, the autonomic node has at least one ACP 
> channel
>   to another device. The node can now participate in further autonomic
>   transactions, such as starting autonomic service agents (e.g., it must now
>   enable the join assistant ASA, to help other devices to join the domain).
> 
> - TEXT: 4.1 "Names are typically assigned by a Registrar at bootstrap time and
>   persistent over the lifetime of the device."
> 
>   NIT: s/persistent/and persist/
> 
> - TEXT: "Out of scope are addressing approaches for the data plane of the
>   network, which may be configured and managed in the traditional way, or
>   negotiated as a service of an ASA. One use case for such an autonomic 
> function
>   is described in [I-D.ietf-anima-prefix-management]."
> 
> - NIT: Sounds sort of Yoda-like, and the compounding makes things less clear.
> 
>   SUGGEST: "Addressing approaches for the data plane of the network are 
> outside
>   the scope of this document. These addressing approaches may be configured 
> and
>   managed in the traditional way, or negotiated as a service of an ASA. One 
> use
>   case for such an autonomic function is described in
>   [I-D.ietf-anima-prefix-management]."
> 
> - TEXT: 6.1: "Following an initial discovery phase, the device properties and
>   those of its neighbors are the foundation of the behavior of a specific
>   device. A device and its ASAs have no pre-configuration for the particular
>   network in which they are installed."
> 
>   NIT: Why suddenly lose the "node" abstraction and start talking about 
> devices
>   here? I think it continues to work well to say "node" (e.g., "node
>   properties", "specific node" and "A node and its ASAs...").
> 
> - TEXT: 6.2 "install ASA: copy the ASA code onto the host and start it,"
>   NIT: "s/host/node/"
> 
> 
> 

_______________________________________________
Anima mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/anima

Re: [Anima] Rtgdir telechat review of draft-ietf-anima-reference-model-07

Reply via email to