(Ccs trimmed)
Christian,
Thanks for this careful review. I'll comment here on the larger issues:
On 2018-08-27 04:03, Christian Hopps wrote:
....
> Minor Major Issues:
>
> - Virtualization is mentioned once in "4.2 addressing" section. To quote:
>
> TEXT: "Support for virtualization: Autonomic Nodes may support Autonomic
> Service Agents in different virtual machines or containers. The addressing
> scheme should support this architecture."
>
> The special casing of VM/containers here seems to indicate that virtual
> devices are not "1st class citizens" in an autonomic network. In particular
> I
> could easily imagine virtual machines being full blown autonomic nodes
> themselves. Assuming the intent is not to restrict virtual devices in this
> manor something needs to be said (somewhere) to make that clear.
I don't think that was the intention. We haven't really explored this in detail,
but I can certainly imagine a deployment (for example) where each tenant in
a data centre has its own virtual autonomic network, and the underlying physical
network is also autonomic. Since the ACP is expected to be implemented as
a VRF, you could even argue that every autonomic network is virtual.
So, yes, we can reword this.
>
> - Robust programming techniques. I think the intention here is to say that the
> design of ASAs must have robustness as a top design principle. I think in
> doing that it should talk about what being robust means; however, it should
> not be talking about how to accomplish that as there are multiple ways to
> achieve this goal.
>
> In particular I feel saying that restarting is the *last* thing an ASA
> should
> do is way overreaching into engineering the solution rather than specifying
> the requirement. Indeed plenty of people think that overly complex recovery
> mechanisms that try everything under the sun to *not* restart often have
> more
> bugs and are less robust than KISS solutions that "fail" simply but recover
> quickly with minimal or no disruption.
>
> I feel this section reads a bit more like someones idea of how to design a
> robust system instead of talking about what robust means which is the
> intent I
> believe.
>
> Perhaps better is just to focus on robust design ideas (some are already
> stated in the text):
>
> - must deal with discovery and negotiation failure as routine.
> - recovering from failures should be minimally disruptive.
> - must not leak resources.
> - must monitor for and deal with hung code.
> - must include security analysis
OK. Since I drafted that text, I will leave the document editor to fix
it. (Some of the detail probably belongs in another draft specifically
about ASAs, which I am editing.)
>
> - 7.4: When text talks about feedback loop, it mentions "allow the
> intervention"
> of human admin or control system; however, it then describes the feedback
> loop
> as presenting default actions and allowing for override. This is fine, but
> it
> seems to leave out the common case where something is misbehaving and would
> not be presenting any choices to the administrator (using the feedback
> loop),
> so the admin must forcefully intervene.
Yes. I think the word "feedback" is a bad choice. For engineers raised on
Nyquist diagrams it is part of a closed loop; for other people it means
feedback to humans. The text needs clarifying.
>
> Minor Issues:
>
> - 6.1 TEXT: "It must be possible to run ASAs as non-privileged (user space)
> processes except for those (such as the infrastructure ASAs) that
> necessarily
> require kernel privilege. Also, it is highly desirable that ASAs can be
> dynamically loaded on a running node."
>
> ISSUE: Discussing implementation details like user-space, kernel privilege
> and
> dynamic loading seems unnecessary and outside the scope of this document.
> Does
> this document care if I implement my ASA on a real-time architecture with no
> "user space" etc..?
Fair enough. See my above comment re robustness.
I'll leave the rest of your comments to the document editor.
Regards
Brian
>
> - 4.6 Why call out global routing and overlay networks in particular? Is the
> real intention to just say that the ACP implementation is not restricted to
> any specific type of networking?
>
> - TEXT: 6.3.1.2 "on a given LAN"
>
> NIT: Everyone knows what a LAN is; however, I wonder if the text should be
> more generic and actually describe what it really requires here which is a
> broadcast or multicast network?
>
> Questions/Comments:
>
> - QUESTION: IoT and node requirements. There a couple node ASA requirements. I
> found myself wondering if a very simple IoT things like thermostats might
> ever
> be an AN and if so did they all really need to have joining assistent ASAs?
> It
> could be that the answer is "Yes, they do or they can't be nodes". I was
> just
> curious.
>
> - COMMENT: For the types of ASAs: simple (run anywhere), complex (resource
> restricted), and infra (run everywhere), I was reminded of Kubernetes/cloud
> orchestration, and the concept of DaemonSets (pods that run everywhere) and
> Deployments (pods that can run anywhere, possibly be scaled replicated, and
> may also have requirements that restrict where they can run). I imagine that
> folks in Anima have also looked at this, but if not it would be good to as
> they seem to be solving very similar problems.
>
> Nits:
>
> - TEXT: 3.2 "However, the information is tracked independently of the status
> of
> the peer nodes; specifically, it contains information about non-enrolled
> nodes, nodes of the same and other domains. "
>
> QUESTION: What are peer nodes? Is this another name for adjacent nodes? If
> so
> "s/peer/adjacent/".
>
> - TEXT: 3.3.1 "enrols"
> CHANGE: "enrolls"
>
> - TEXT: 3.3.3 "In this state, the autonomic node has at least one ACP channel
> to
> another device. It can participate in further autonomic transactions, such
> as
> starting autonomic service agents. For example it must now enable the join
> assistant ASA, to help other devices to join the domain.
>
> NIT: "For example foo" is not a sentence on it's own, also "It" is not a
> good
> subject as there are multiple nouns in the previous sentence that could
> serve
> as antecedents.
>
> SUGGEST: 3.3.3 "In this state, the autonomic node has at least one ACP
> channel
> to another device. The node can now participate in further autonomic
> transactions, such as starting autonomic service agents (e.g., it must now
> enable the join assistant ASA, to help other devices to join the domain).
>
> - TEXT: 4.1 "Names are typically assigned by a Registrar at bootstrap time and
> persistent over the lifetime of the device."
>
> NIT: s/persistent/and persist/
>
> - TEXT: "Out of scope are addressing approaches for the data plane of the
> network, which may be configured and managed in the traditional way, or
> negotiated as a service of an ASA. One use case for such an autonomic
> function
> is described in [I-D.ietf-anima-prefix-management]."
>
> - NIT: Sounds sort of Yoda-like, and the compounding makes things less clear.
>
> SUGGEST: "Addressing approaches for the data plane of the network are
> outside
> the scope of this document. These addressing approaches may be configured
> and
> managed in the traditional way, or negotiated as a service of an ASA. One
> use
> case for such an autonomic function is described in
> [I-D.ietf-anima-prefix-management]."
>
> - TEXT: 6.1: "Following an initial discovery phase, the device properties and
> those of its neighbors are the foundation of the behavior of a specific
> device. A device and its ASAs have no pre-configuration for the particular
> network in which they are installed."
>
> NIT: Why suddenly lose the "node" abstraction and start talking about
> devices
> here? I think it continues to work well to say "node" (e.g., "node
> properties", "specific node" and "A node and its ASAs...").
>
> - TEXT: 6.2 "install ASA: copy the ASA code onto the host and start it,"
> NIT: "s/host/node/"
>
>
>
_______________________________________________
Anima mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/anima