Re: [Anima] [art] Artart last call review of draft-ietf-anima-asa-guidelines-04

Francesca Palombini Wed, 19 Jan 2022 15:16:01 -0800

Martin: many thanks for the thoughtful review, and thanks to the authors for 
responding to Martin’s comments. I have balloted No Objections on this document.


Francesca

From: art <[email protected]> on behalf of Martin J. Dürst 
<[email protected]>
Date: Tuesday, 14 December 2021 at 08:00
To: Brian E Carpenter <[email protected]>, [email protected] 
<[email protected]>
Cc: [email protected] <[email protected]>, 
[email protected] 
<[email protected]>, [email protected] <[email protected]>
Subject: Re: [art] Artart last call review of draft-ietf-anima-asa-guidelines-04
Hello Brian,

On 2021-12-14 06:17, Brian E Carpenter wrote:
> Hi Martin,
>
> Thanks for the careful review. I've inserted a few comments in line
> below, but we will take care of all your points in the next version.

Glad to be of help. Sorry most of it sounded pretty unpersonal.

A few comments inline.

> Regards
>     Brian
>
> On 13-Dec-21 22:36, Martin Dürst via Datatracker wrote:
>> Reviewer: Martin Dürst
>> Review result: Ready with Issues
>>
>> I'm not an expert in autonomous systems, and don't have experience
>> reviewing
>> guideline/operational drafts (as opposed to protocol/format drafts).
>>
>> Overall, I'm not completely sure this document is needed. Some of it
>> sounds
>> like very obvious advice (paraphrased: install stuff, then configure
>> it, only
>> then let it run). Other stuff may be covered in much more depth in a
>> tutorial
>> or book about fault-tolerant and real-time systems. A pointer to such
>> literature might help.
>
>
> Our impression is that typical network ops people do need such advice.
> You're correct that a lot of it is familiar to experts in control systems
> with fault tolerance and real time requirements. (Personally I worked on
> real time control systems before I got into networking.)

I see, makes sense.

>> The rest of this review is based on the assumption that the document
>> serves an
>> advisory purpose, and there are enough people supporting its publication.
>>
>> This document is close to ready, but a few places would benefit from
>> updated/clarified wording or fixes.
>>
>> Introduction:
>> "could be upgraded as an ASA" -> "could be upgraded to an ASA"
>>
>> Section 3.3:
>> "This API is intended": Say which API. The last mention ("The GRASP
>> API") is
>> quit a bit away.
>>
>> The following two sentences overlap very much in content; maybe one of
>> them is
>> unnecessary:
>>     The format and size
>>     of the value is not restricted by the protocol, except that it must
>>     be possible to serialise it for transmission in CBOR [RFC8949],
>>     subject only to GRASP's maximum message size as discussed in
>>     Section 5.
>>
>>     The actual value field of an objective is limited by the GRASP
>>     protocol definition to any data structure that can be expressed in
>>     Concise Binary Object Representation (CBOR) [RFC8949].
>>
>> Figure 1 (Section 6): The diagram starts and ends at "Undeployed". I
>> wonder
>> whether "Uninstalled" may be more appropriate.
>>
>> 6.2:
>> "on the same autonomic node and resources it is controlling" -> "on the
> same
>> autonomic node as the resources it is controlling"
>>
>> 6.2.2:
>> "like a domain a technology segment or" -> "like a domain*,* a technology
>> segment or"
>>
>> 6.2.3:
>> "ASA Instance Mandate" seems to be an established term, but I don't
>> find a
>> definition or pointer to one.
>>
>> 6.3:
>> The note, if really needed, should come towards the end, or be removed.
> It
>> reads like an admission that the draft isn't ready for publication.
>>
>> There are two lists, but they have neither bullets nor numbers, which
>> looks
>> very strange.
>>
>> The first bulleted list has some wording problems: "meaning enabling
>> those":
>> what does "those" refer to? "meaning setting them": what does "them"
>> refer to?
>> "by updating": This should not come after the colon.
>>
>> Section 6, last paragraph: "The ASA life-cycle is discussed in more
>> detail in
>> "A Day in the Life of an Autonomic Function"
>> [I-D.peloso-anima-autonomic-function].": This comes way too late. And
>> it's way
>> unclear why there is a need for two documents when the longest chapter
>> of this
>> document already discusses the life cycle. (And the authors should
>> agree on
>> whether they want to spell life cycle with or without a hyphen.)
>
>
> We should probably rephrase to indicate that the draft was a source of
> ideas
> rather than a current reference, unless Pierre plans to restart work on it.
>
>> Sections 7, 8, and 9 seem way too short for sections. At least sections
> 7 and 8
>> could be easily combined, because they are both about coordination.
>> Section 9
>> should be absorbed in an appropriate location, it's even shorter than
>> sections
>> 7 and 8.
>>
>> Section 7: "a method is needed of identifying" -> "a method is needed to
>> identify"
>>
>> Section 9: "quite likely to be expressed" -> "quite likely expressed"
>>
>> Section 10, sentence before numbered list should end with a period or
>> colon.
>>
>> Section 10, list item number 8: It should be defined clearer whether
>> GRASP
>> objectives work according to a "must ignore" model for extensibility or
> not.
>> This is not the right location to do this, but if a base spec does it,
>> this
>> text should clearly reflect that, and if there is nothing in the base
>> spec,
>> that's a serious problem there.
>
>
> Each objective has its own base spec, so that's where extensibility should
> be covered. There's a general mechanism (the GRASP M_INVALID message)
> whereby
> a recipient can signal "I didn't understand". Along with the O_ACCEPT and
> O_DECLINE options in a negotiation, that allows for all sorts of
> extensibility
> policies.

In that case, write something like: "For each GRASP objective, make sure
you follow the relevant extensibility requirements."

The way it's currently worded, an implementer may make things extensible
where the GRASP objective spec prohibits that.


>> Section 10, last paragraph: "state [1]", "state[2]", "state[3]":
>> Preferably use
>> other labels to distinguish states; these here look too much like
>> roferences.
>>
>> Security considerations: "Such an ASA must also be designed to avoid
>> loopholes": Some concrete examples would help here.
>>
>> Reference [Huebscher08]: "computing--degrees": Either use a wider
>> hyphen or
>> just "computing-degrees".
>>
>> Reference [RFC8368]: In my printout, the label and the contents of this
>> reference ended up on different pages. This is probably a general
>> CSS/browser
>> printer problem, but should be checked.
>>
>> Appendix B: ASA: "An agent implemented on an autonomic node that
>> implements":
>> "implemented ... that implements" is confusing. "either in part (in the
> case of
>> a distributed function) or whole": "either in part (in the case of a
>> distributed function) or as a whole"
>>
>> Appendix C, Example Logic Flows: "This appendix describes generic logic
> flows":
>> It should be made clear early on that these are not alternatives (as
>> the plural
>> may suggest), but are a set of flows that work together.
>>
>> Appendix C, second paragraph: "would" is used twice, but can easily be
>> replaced
>> by "will".
>>
>> Appendix C, "When running as an origin": Shouldn't this also say that the
>> origin gives out resources to the delegators?
>>
>> Appendix C, "When running as a delegator": Where do the resources come
>> from?
>> "quantities of the resource" vs. "it hands out resource": is "resource"
>> countable or not? It should be the same throughout the draft.
>>
>> Appendix C, "in stable storage": What is "stable storage"? SSDs and HDs
> can
>> easily be damaged, too. If resources can really get lost, there seems
>> to be a
>> need to re-establish a resource distribution by some kind of protocol
>> that
>> checks who had or claims to have had what, and find the gaps. Also, this
>> ignores the whole problem of atomic reads/writes. What happens if a
>> delegator
>> goes down before it has had time to write the resources into stable
>> storage?
>> Another problem: It's not the resources that are written to storage,
>> it's just
>> information about them.
>
>
> Of course these are real problems that an implementation must solve.
> It wasn't our intention to cover everything here, just the usage of
> the ANIMA infrastructure. We should make that clearer in the text.
>
>>
>> Appendix C: "a suitable timeout expires": This is very close to "a
>> suitable
>> timeout times out", which sounds strange. What about "a suitable
>> timeout is
>> reached" or "a suitable timeout is triggered"?
>>
>> Appendix C: "consumers routers" -> "consumers' routers" or what?
>>
>> <CODE BEGINS>: Is this suitable for pseudocode that can't be executed?
>
>
> I'm not sure. The advantage of <CODE BEGINS> is that it changes the
> derivative works status, according to the IETF Trust legal provisions.
>
>>
>> In the "do forever" loop in the MAIN PROGRAM, the assignment "good_peer
> = peer"
>> does not seem to make sense, because good_peer is set to none at the
>> start of
>> the forever loop. Does the initialization "good_peer = none" need to
>> go before
>> the start of the loop?
>
> I'm going to go back to my Python implementation of RFC8992 to check
> that...
>
> Brilliant catch! That's a transcription error when I transcribed Python
> into pseudocode. The Python says:
>
> good_peer = None # where we remember a helpful peer ASA
> next_peer = 0    # for cycling through peers
> while True:
>      ...
>
>> In MAIN_NEGOTIATOR thread, the line
>> "i.e., wait for M_REQ_NEG"
>> has two problems: 1) it should be clearer that this is a comment, and
>> 2) it's
>> unclear what M_REQ_NEG refers to.
>
> That assumes detailed knowledge of the GRASP protocol. Probably simpler
> if we delete both mentions of M_REQ_NEG.
>
>> In DELEGATOR thread:
>> The two lines in the "if OK" case have to be executed as an atomic unit
> if one
>> wants to avoid resources getting lost or being doubly used.
>
> True, but there is a single instance of the thread. So it's atomic
> unless the thread crashes right then, in which case the resource is
> gone from the pool and lost.

There's probably a single instance of this thread, but other threads may
be running, too. And you don't know whether some implementation might
not choose to use more than one thread.

> I'm not sure computer science has a
> solution to that problem.

I agree. But there are things to alleviate this, e.g. making sure that
both operations are carried out together without other, unrelated
threads or processes interfering, or even just making sure the details
are done so that any superfluous stuff gets done before or after, but
not in between. Also, it's probably a good think to say that there's no
complete solution, because this can affect how operators handle things
when there have been problems.

Regards,   Martin.

>>
>> In FLOODER thread:
>> How long is the sleep() or periodic timer supposed to be?
>
> That's really app-dependent. We'll see if we can express that better.
>>
>>

_______________________________________________
art mailing list
[email protected]
https://protect2.fireeye.com/v1/url?k=31323334-501d5122-fe22d327-454445555731-aa04b3d63d75790e&q=1&e=d8df126a-1da9-41db-a3d2-6644c10b52a2&u=https%3A%2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Fart

_______________________________________________
Anima mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/anima

Re: [Anima] [art] Artart last call review of draft-ietf-anima-asa-guidelines-04

Reply via email to