[Anima] Eric/ANIMA: Re: AD review of draft-ietf-anima-autonomic-control-plane-21

Toerless Eckert Mon, 09 Mar 2020 02:33:14 -0700

Eric, WG: Summary:

This email serves to explain the fixes done with rev-23/-24 just posted
since Februar's -22 revision primarily in response to Eric Vynckes review.
Eric is now the responsible AD for this document. Also the discuss we had
with security folks.

Functional changes are quite limited:
- Refined IPsec requirements.
- hex lower (instead of upper) case in ACP domain info
- added explanation of Clock requirements (6.1.3.1.)
- Explicit SHOULD for TLS 1.3, "desirable" for DTLS 1.3 (as long as no RFC).
- Filtering requirement for (currently unused) RPL headers.

All the other changes just result in better text.

If Eric agrees his points are solved to move forward, we have one
discuss from Ben Kaduk re. the encoding of ACP domain info (see reply to him),
but he said he wouldn't want to hold off the doc for it (maybe check with him),
and some small fixes in IPsec section from discuss we had in prior weeks
(maybe just running out of time for 2 week shutoff before IETF, so first
committing without those).

Details:

Thanks a lot Eric for your great review

Sorry for the long time to get back to your reply, but i had no time
before the end of january, and since then i have been working hard
on the non trivial 50% of your 72 point and also trying to address
the remaining points from Bens review and other IPsec discuss.

- rev 22 from early feb attempts to address Bens remaining point(s),
but we did not finalize that discuss with the SEC experts yet, i will
start a separate email thread for that.

- Revision -23 addresses 71 of your 72 points.
Answer to your point 69 is swap of two big sections and therefore
committed to -24 to have a usefule -22 to -23 rfcdiff.

- I need to still commit (-25) a few paragraph changes for IPsec from
the discuss on the IPsec mailing list. I may runn out of time
before the 2 week downtime for datatracker. Will let you know when
i commit.

Aka: -25 should be my final offer before i get more feedback.

I have appended your original review points with the same numbers
of points as in belows response. Not sure if you want/need to
submit your points to datatracker. Just in case.

Your review points with my replies are below, prefixed by this:
{<number>:<status>}, where status can be:
a(nswered) - (30 of 72) no textual change, but textual answer intended on my
side to close the point.
f(fixed) - (41 of 72) textual fixes, in my opinion closing the point.
d(elayed) - your point 1, see below

You may not agree with my solutions of course, please check my answers.

Summary of important text diffs in changelog section of document as always.

RFCdiff: (you reviewed -21, but -22 introduced only more feedback vs. Bens
review, which didn't overlap with your questions. In addition, it removed
all changelog and summarized it, so eaier for you to review against -22):

http://tools.ietf.org//rfcdiff?url1=https://tools.ietf.org/id/draft-ietf-anima-autonomic-control-plane-22.txt&url2=https://tools.ietf.org/id/draft-ietf-anima-autonomic-control-plane-23.txt

And this diff is just the reordering from your point 69.

http://tools.ietf.org//rfcdiff?url1=https://tools.ietf.org/id/draft-ietf-anima-autonomic-control-plane-23.txt&url2=https://tools.ietf.org/id/draft-ietf-anima-autonomic-control-plane-24.txt

Cheers
Toerless

On Fri, Jan 03, 2020 at 03:45:36PM +0000, Eric Vyncke (evyncke) wrote:

{1:d} [ Wrt to non-technical textual review, shortening sentences etc:]

Agreed. I did this once after WG last call, and i will do this
again before it goes to RFC-editor, but for the time being i would like
to keep diffs focussed on the reply to technical issues to make
reading of diffs easier for reviewers. I am probably also the worst
of all authors to shorten sentences.

{2:f} > Please also check the long output of
https://tools.ietf.org/idnits?url=https://tools.ietf.org/id/draft-ietf-anima-autonomic-control-plane-21.txt

Yikes.
https://trac.tools.ietf.org/tools/ietfdb/ticket/2880

I was so happy that all my prior versions passed idnits check, you where
the first reviewer one noting that they really don't AND provided me
with the URL to prove it. Because to me i always looked fine the way
i did it.

Tried to fix what was really an issue IMHO. Lots of wrong positives
i think (e.g.: section number references like 6.7.1.2 seem to be recognized as
non-compliant IPv4
addresses ;-).

Some complaint about using obsoleted RFC referrences but those
are intentional: Left some older protocols where idnits points to newer
versions,
eg.: i am pointing to (insecure) BSD syslog in list of widely used protocols
to be protected by ACP. There is a newer RFC that would allow to combine
with TLS without specifying exactly how, and i don't think available
on any router today. So kept the widely deployed / insecure protocol
references for that part of the text.

> No source of time
Asked and answered below
>
> Generalized TTL Security Mechanism (GTSM)
Asked and answered below
>
> IANA consideration for 'type' field of the address
....
>
>
{3:a}> - *** why not using Generalized TTL Security Mechanism (GTSM) in
addition to the use of LLA ?

Interesting question. I thought we disussed this but i can not find evidence on
the mailing list.

RFC5082 does not discuss how relevant GTSM is still in the presence of
link-local
addresses. I for once have not heard nor could think of how to create the
attack vector that GTSM intends to protect against when link-local destination
addresses
are used. Are you aware of any specs arguing that GTSM on top of link-local
addresses improves something ? I was quickly browsing through the RFCs and could
not find anything.

Also: the use ot TTL=1 with link-local addresses for DULL GRASP is specified in
GRASP
(seection 2.5.2 of the GRASP draft), not ACP. ACP just uses it. If you wanted
to change
it, that would be an update to GRASP spec.

Also, DULL GRASP in ACP results in secure channels, yet another line of defense.

You can see from the text, that i do in general like to address repeatedly asked
question with explanatory text (e.g.: appendix discuss re. CDP/LLDP, repeatedly
asked),
but if i remembre correctly, you're the first one who asked this, hence no
text added (yet ;-) for this.

{4:f} > - *** there is very little text about time synchronization except a few
words in section 10

See new subsection "Realtime clock and Time Validation" in the ACP domain
membership
section. Effectively we need a followup document describing a standard on how we
would want to distribute current time information across the ACP if we want to
well support ACP nodes without a raltime clock (or realtime clocks with dead
batteries).

{5:a} > - *** I am not too familiar with RPL but AFAIU there is a RPL root, how
this RPL root is selected in ACP ?

6.11.1.12, Automatically selected by signaled DODAGPreference preference of ACP
node.

{6:f} > - while I like the fact that section are indicated as
???(informative)??? and as I have never seen this used before this I-D, I
wonder whether some explanations of this tagging would be welcome

now in acronym/terminology section:

<t>This document serves both as a normative specification for how ACP
nodes have to behave as well as describing requirements, benefits,
architecture and operational aspects to explain the context.
Normative sections are labelled "(Normative)" and use
<xref target="RFC2119"/>/<xref target="RFC8174"/> keywords.
Other sections are labelled "(Informative)" and do not use those
normative
keywords.</t>

Btw: reason is that WG was not allowed to write a separate architecure,
requirements document in charter 1.
Not sure in hindsight if juggling 3 or more interdependent documents would
have been easier than the one giant document approach we ended up with.

{7:f} > - section 1, ???Management and Control" meaning no monitoring ? even if
explained later

I have replaced the "Management" in "Management and Control" with "OAM",
because we really mean OAM when we said "Management". And OAM of course is
inclusive of monitoring.

Given how we where only pointed to prefer the term OAM over "management" late
in IESG review, i can not replace "management" in all places with OAM, so i
added definitions for Management and OAM into the terminology section,
indicating Management means OAM, and OAM also includes monitoring.

{8:f} > - section 1, ACP/OAM are defined twice but ANI is not

fixed.

{9:f} > - section 1, " An operator can use it to log" should be clarified: SSH
? SNMP? NETCONF ? (even if explained later)

Fixed to: An operator can use it to access remote devices using protocols such
as Secure SHell (SSH) or Network Configuration Protocol (NETCONF) running
across the ACP

{10:f} > - section 1.1, DTLS 1.2 while everyone moves to TLS 1.3 ? I am unsure
whether there is a DTLS 1.3 in the cooking but be ready to have comments from
the transport area AD

Note that Eric Rescorla and Ben Kaduk did not issue concerns about this.
Let me answer this point in the answer to later sections points about DTLS.

{11:a} > - section 2, in " ACP secure channel" is integrity also important? It
is usually a side effect of encryption though but worth mentioning ?

Fixed.

Btw: integrity-protection is not a side-effect of encryption but of
authentication.
Eg: ESP Null encryption or AH would also give integrity-protection.

Actually, followup question to the IPv6 expert:

Integrity protection not only helps against attackers (which is primarily why
we do it),
but possibly also againt bit/frame errors in non-integrity protected L2
underlay.
Given how IPv6 removed the header checksum, i wonder if that was ever seen as a
downside
vs. IPv4 on those type of L2. I for once wouldn't know/remember such an L2
(e.g.: ethernet
has checksum), so i am not sure it exists, but if it does, then we could mention
it as a case where ACP secure channels are protected.

{12:f} > - section 2, in " BRSKI and GRASP are products of the IETF ANIMA
working group" replace "products" by "specifications" ?

fixed.

{13:f} > - section 2, in " node: A system, e.g.," please remove "e.g."

fixed.

{14:a} > - *** section 2, remove " It is the approximate IPv6 counterpart of
the IPv4 private address"

This sentence was refined i think twice or more through reviews because:

I had great success in using this comparison with customers when explainng ULAs.
And in any discussion i had where i was trying to explain ULA without comparing
to IPv4
i always got the question "so this is the IPv6 version of private addresses in
IPv4 ?".

If you want to propose a better sentence that explains ULA by comparing it to
IPv4 private addresses, i am happy to take new/better text, but i think
without such a comparison we would be doing a disservice to readers unless
they already are IPv6 geeks.

{15:f} > - section 6.1.1 it is unclear whether it is "beneficial to copy the
device identifying fields of the node's IDevID into the ACP domain
certificate," as the same paragraph also says it is a bad idea...

Ok, hope this is now easier to read, it also adds a hopefully good recipe.

<t>For diagnostic and other operational purposes, it is beneficial to copy the
device identifying fields of the node's IDevID into the ACP domain certificate,
such as the "serialNumber" (see <xref
target="I-D.ietf-anima-bootstrapping-keyinfra"/> section 2.3.1). This can be
done for example if it would be acceptable for the devices "serialNumber" to be
signalled via the Link Layer Discovery Protocol (LLDP, <xref target="LLDP"/>)
because like LLDP signalled information, the ACP certificate information can be
retrieved bei neighboring nodes without further authentication and be used
either for beneficial diagnostics or for malicious attacks. Retrieval of the
ACP certificate is possible via a (failing) attempt to set up an ACP secure
channel, and the "serialNumber" contains usually device type information that
may help to faster determine working exploits/attacks against the device.</t>

{16:f} > - section 6.1.2 remove leading 0 in the IPv6 address of the example

Hope you meant:

given an ACP address of fd89:b714:f3db:0:200:0:6400:0000

now:
given an ACP address of fd89:b714:f3db:0:200:0:6400:0

{17:a} > - section 6.1.2 "32HEXDIGIT" or "32 HEXDIGIT" ?

"32HEXDIGIT" is correct, see rfc5234, 3.7

{18:a} > - section 6.1.2 it is unclear whether acp-address is a valid ULA
address as the text mentions later "hash to generate ULA"... Also, is there any
specification on how to generate this acp-address? Same /48 prefix for example ?

This text is deliberately as it is. Better explanations would have to
point to potential future work, and i had some important
IESG reviewers (including Alica) say that text about "future" makes
document look incomplete and rather not have it. I do not agree with
this, but i follow this IESG advice.

The mandatory hash to create the ACP ULA on a registrar from the hash
is defined in 6.10.2, base ACP address scheme.

An ACP domain information field where the IPv6 address is NOT a ULA is
perfectly compliant with 6.1.2, but the use of a non ULA addresses is
outside the scope of what is standardized in this spec because of 6.10{.2}.

Aka: future variations of ACP could perfecly use a different hash
or even non-ULA and should still be compatible with 6.1.2, and
would only need to update/ignore/change 6.10.2.

If i would put stronger text in to emphasize this distinction between
6.1.2 and 6.10.2, such text might help one type of future extensions,
but might hurt other types of future extensions.

Aka: the text says everything we know to be correct and mandatory
and necessary for this specs target, and tries NOT to say things
that would potentially make extensions more difficult.

{19:f} > - *** section 6.1.3, suggest to use "MUST" when closing the secure
channel upon discovering via CRL/OCSP that the cert was invalid.

fixed.

{20:f} > I would even suggest to use "SHOULD close all ACP peers connection"
to block the wrong path for the benefits of 'downstream ACP nodes'

I added "This applies of course to all ACP secure channels to this peer if
there are multiple."
but i find it somewhat redundant. Is that what you meant ?

{21:f} > - section 6.1.3 about the same numbered points: please move the
"Note:" below point 4) as done for point 6)

fixed.

{22:f} > - *** section 6.1.3 (and others) MUST use normative language, i.e.,
"MUST" "SHOULD"... in uppercase

Ok, i did a complete normative text scan (sigh ;-).

I changed some must to MUST where i felt it added relevant interop
requirements, otherwise
i changed text to not use must/should but other words.

Example: "In DULL this field is irrelevant but must still be set according to
the GRASP specification."
Rather feel like it doesn't make sense to repeat normative requirements from
other specs, so changed this
to "but is still set...".

Example: "the ACP connect interface and NOC systems connected to it 'needs to
be' physically controlled/secured"

This was 'must', but i don't want to be challenged on how to implement a MUST
or give an
RFC reference for it.

I cold not find missing SHOULD. There is a lot of explanations (not the actual
normative
requirements) where should is used, and i didn't feel it would help the text to
change
the language to avoid he word should.

{23:a} > - section 6.1.4 I am pretty sure that the mechanism of cert chains &
trust anchors are well defined in the literature, perhaps easier to refer to
rather than describing the mechanism

Well defined, probably, but:
Its actually not easier to refer to because a lot of is is in non-IETF docs
like X.5xx ITU-T docs and the PKIX architecture of the IETF is also quite
"scattered" across RFC. And actually there is little operational documentation,
but mostly protocol specs only hard to understand for non-sec experts.

I think it is a big benefit to the target audience of the document to
have this summary. It hopefully makes adopting ACP by developers/operators
as much possible "standalone" as possible. Aka: if we want to proliferate
security architectures, we need more documents like this that explain
enough for how it can be applied but also include all the key aspects
that you need to understand/use.

Ultimately, this section also summarizes the security understanding of the
authors,
and by having it written down and gone through SEC AD review there is
a better degree of confidence that it is sound/correct.

{24:f} > - section 6.1.5 " remember the EST server" is unclear... is it the
FQDN or IP address or xyz to be remembered ?

fixed to:
ACP nodes SHOULD be able to remember the IPv6 locator (parameters of the
O_IPv6_LOCATOR in GRASP) of the EST server...
....

{25:a} > - section 6.1.5.2 should some randomness be added for the time when
cert has to be renewed? I fear flash crowd effect

This text doesn't preclude implementations to do this, but lets not
over-engineer the normative part with not really too clear options. I have not
seen such randomness in real life deployments of e.g.: VPN solutions,
especially not from running an IPsec VPN in production (which i did for a
while). It would have made life in operations more difficult too, because it
makes it more difficult to recognize precisely when a node that should renew is
not doing so. With explicit known times you can calculate this from certs on CA
and lifetimes. And automate/track whether nodes do renew accordingly.

Also note that Certs lifetime typically start when they are physically
deployed, which at least today co-incides with physical deployment. So no risk
of flash crowds. Even if you would in future more automated deployment batch
initial BRSKI rollout , e.g.: bringing all new pledges online once a day or so,
any possible performance issue would first be seen in BRSKI and could be fixed
also easily by randomnly varying the lifetime in certs. Instead of putting more
nerd-knobs into on-node code.

{26:f} > - section 6.1.5.3 " SHOULD support Certificate Revocation Lists (CRL)"
should specify 'processing' or 'retrieval' or ...

fixed to
SHOULD support revocation through Certificate Revocation Lists (CRL)

{27:a} > - section 6.1.5.4 while I am a big fan of very short cert lifetime (to
avoid CRL), I am less sure for the ANIMA use case... what if the ACP node is
disconnected for 1 day? No way to restart the whole process :-( with going
through 6.1.5.5 ?

Re-enrollment via BRSKI can be fully automatic and without having to go back
to the MASA when the BRSKI registrar ignores the clients certificate expiry
time. This
option is i think now documented in ACP and in BRSKI too.

To keep a particular network region alive under loss of external connectivity
longer than cert-lifetime, you can use a region-local registrar with built-in
Sub-CA functionality,
also runs fully automatic. Could be in every branch of an enterprise (WAN edge
router).
Lot more complex functionality you nowadays have on those boxes than
subCA/registrar.
Also a reason why i added all this subCA text to the doc.

This may not does result overall in a more stringent security model than
CRL/OCSP
under failures and attack, but its well inline with todays directions of
survivability design in typical ACP targets: enterprise, SP, manufacturing,...

CRL also have a lot of silent failures where you just don't get new updates for
them as they're yet another rarely used separate signaling channel.

{28:a} > - *** section 6.3 unsure to understand why you need to use SLAAC for a
link-local IPv6 address.

You need a link-local IPv6 address for DULL GRASP messages and for
the ACP secure channel that would use that link-local address.
Auto-assigning link-local IPv6 address requries DAD. DAD is part of SLAAC.

Please suggest better text if this is not it.

{29:a} > - *** section 6.3 most of the implementations that I know do not use
MLD for link-local multicast, they simply flood. Especially on a p-2-p link.

sure, but that is not the point. See more comprehensive answer to the following
point of yours:

{30:a} > Please reconsider rewriting the section on MLD snooping requiring MLD
by some more explanations. Also, the use of a IANA ll mcast should probably
render MLD snooping useless (i.e. I am pretty sure that router / nodes do not
use MLD for ff02::1 or ff02::2)

No change because:

RFC2710 (MLDv1):

> MLD messages ARE sent for multicast addresses whose scope is 2
> (link-local), including Solicited-Node multicast addresses [ADDR-
> ARCH], except for the link-scope, all-nodes address (FF02::1).

I repeatedly asked about this point in the last few years in PIM-WG,
and got (as far as i remember) reconfirmation that this is indeed
what we want. I remember that we did this explicitly when we did
MLDv1 because of all the problems we had with link-local
multicast in IPv4 (and snooping switches not capable to deal with this
because IGMP never demanded this).

Alas, i did not have the time to fully review MLDv2 when it came
out, and it is indeed missing that sentence.

The simple explanation for this is that the authors of MLDv2
(not involved in before) did not inherit any MLDv1 text but
translated IGMPv3 from IPv4 to IPv6, and by that time there
was i think not too much review if everything we had improved
in IPv6 with MLDv1 was completely put into the completely separately
written MLDv1 RFC.

More (not so) funnily, i think i could not find any statement in any
documents that you MUST use IGMP/MLD as a listener - except the
one above from MLDv1. So for all intend and purpose it's up
to the application to decide if it wants to use MLD/IGMP.

Aka: With current MLDv2 text, it is the perogative of the app (GRASP or ACP)
to mandate applications receiving IPv6 multicast packets to use
MLDv2 whatever the scope of the address is. Aka: That is the ultimate
explanation why this ACP text can mandate this now (without waiting for any
MLDv2 text changes).

Better yet: The spec(s) themselves (MLDv2, maybe also IGMPv2) need to be fixed.

I opened an errata against RFC3810 and will discuss:
https://www.rfc-editor.org/errata/eid5977
Also will work solving the missing text in PIM-WG.
Given some of the RFC8200/SRv6 discuss i see (no re-interpretation of
old group intent) maybe we need a one-page update to MLDv2 instead of
an errata, but i think it woudn't be contentuous in PIM-WG.

I don't really think i want to explain any of this mess in ACP document,
hence no change. Please suggest text if you think there is one
that doesn't look too much like "dirt under MLD carpet" or becomes too long.

{31:f} > - section 6.3 (and possibly others) please use only lowercase in IPv6
address (e.g. fe80...FEED... looks weird)

Ack. Also changed the the rfc822 encoding of the address back to lowercase.

{32:a} > - section 6.3 s/ttl/TTL/

Not fixed. Blame Brian (Carpenter, GRASP author).

GRASP defined 'ttl' a a msec united time to live, so not ony is the name fixed
(ttl), but i definitely also do not want any confusion with (IP-) TTL, which
is actually used in that meaning in the RPL section.

{33:f} > - section 6.3 IKEv2 was already expanded before. The very same issue
(repeating expansion) occurs quite often in the document... Hence, the doc has
an 'amateur' look deserving it (because it is real smart work)

Ack. wrote a script to find those cases, and fixed them. Hope script found all.

{34:f} > - section 6.5 please explain notation like " [4:C1]"

Fixed. (C1 is the connection identifier).

{35:f} > - section 6.5 please expand 'MTI' and why not using IETF "MUST" ?

fixed and fixed.

{36:f} > - *** section 6.7 about PFS, did you check that DTLS 1.2 support PFS ?

Yes.
ACP spec says MUST support RFC7525, which says:
This document therefore advocates strict use of forward-secrecy-only ciphers.
Ben asked me to change PFS to "forward secrecy". I also changed it to "MUST use
forward secrecy".

[ Alas, i find rfc7525 somehwat lacking as it is not explicit in the list of
crypto options that actually do provide PFS, but thats the IETF BCP for the
subject matter, and if that BCP finds it adequate to let the reader figure out
by herself which of the hundred crypto algorithms in TLS/DTLS do that, then i
want to be the last one who gives more explicit guidance in an already way too
long ACP spec (rant off). ]

{37:a} > - *** section 6.7 can ACP really rely on any L2 security mechanism? Or
isn't it a catch 22 game ?

Reread several times, i think paragraphs are sound.

The paragraph reflects what we brainstormed outside IETF for MacSec, but its
really a generic template. Think for example IPsec where instead of ESP you use
MacSec. Of course, you need to NOT encrypt IKEv2 packets via MacSec like you
also do not encrypt them through ESP thats avoiding your catch 22.

{38:f} > - section 6.7.1.1 I do not mind too much, but, I wonder why you put
some IANA non-consideration in the text. Suggest to remove

Leftovers from early days trying reconfirm for ourselves what we needed to ask
IANA. fixed.

{39:f} > - section 6.7.2 the text about DTLS 1.3 is unclear.

Fixed. See next point.

{40:f} > I have really mixed feeling about using DTLS 1.2 as it is soon to be
deprecated and ANIMA should use the latest and brightest (OTOH one approved
your document can sit in the RFC editor queue for months/years if waiting for
DTLS 1.3 to be published)

As part of the parallel discussion with security folks, there is a bit more
improvements in the security text than just your asks. Primarily pulling
out what i think are good common requiremnts from IPsec/DTLS and put it
into common paragraph on top of the section.

If i do recall all my discusses: SEC AD had no concerns with mandating only
DTLS v1.2,
not even to only require TLS 1.2 (even though TLS 1.3 is out).

For DTLS i did rewwrite it to address your points:

DTLS1.2 is indeed MTI, but better explanations why in text (e.g.: desire to
adopt ACP to lower-end devices with often a lot slower evolution of
firmwaare,
strciter common ACP secure channel security requirements - aka: going maybe
50% where DTLS 1.3 is.

"DTLS" in GRASP really means DTLS 1.2 or anything newer/better that can
negotiate
down to DTLS 1.2. Aka: DTLS 1.2 + DTLS 1.3 implementation is fine.

Non-normative text to suggest also to support DTLS 1.3, and RFC-editor note
that that text will change to SHOULD support DTLS 1.3 IF we have an DTLS 1.3
RFC by AUTH48. Hence avoiding waiting for DTLS 1.3, because the explanations
above should well enough explain why there is not enough additional value
for this use-case in DTLS 1.3 now to make it MUST (or to forego the MUST
for DTLS 1.2).

In general, i do not agree with your statement "hot off the press is always
best",
especially i don't think there is one-size-fits-all, and the TLS recommendations
are very centric to "web-software" with better consistent upgrade cycles
than we have in other parts of the industry.

Check 6.7.2, and let me know if there is still anything you would like to see
improve.

{41:f} > - *** section 6.8.2 please use the RFC for TLS 1.3 as it now exists

Changed to:
TLS version 1.2 (<xref target="RFC5246"/>) is REQUIRED and TLS 1.3 (<xref
target="RFC8446"/> is RECOMMENDED.

Discussed also with Ben, there is no mandate to (only use TLS 1.3 in solutions
like ACP.

Otherwise similar argument than DTLS except that its also used end-to-end so
the lowest-common denominator problem is stronger (aka: must have working MTI
across ALL nodes, whereas DTLS would only be some some "low-end" nodes).

{42:f} > - section 6.8.2 to be honest, the text is easier to read than the
picture, so, suggest to move the picture after the text

fixed.

{43:a} > - section 6.8.2 I did not re-read the ANIMA architecture document but
I would assume that this 6.8.2 section is a part of the architecture document.
Are they in sync?

Its a reference model, not an architecture, but yes. It does not say which of
the
specs had to include which text. 6.8.2 effectively resulted when GRASP was in
IESG security review and we concluded that GRASP itself wouldn't have to specify
its transport and security layer, but the solution adopting GRASP
would have to do that. Makes GRASP easier to adopt to different solutions (like
ACP).

{44:a} > - section 6.8.2.1 looks more like a security consideration section to
me... move it there?

I'd rather not:

Eric/Ben did not raise a concern about this. A lof of the doc is about security
and explanations
thereof, security section would be very long if we moved all explanations there
and IMHO
wouldn't help readability. And this section is quite long. I tried to have
security section
be igh level analysis plus strange/tangential stuff, but otherwsie keep
security explanations
local to where they are needed.

{45:a} > Also, it is hop-by-hop TLS? Then a hostile ACP node can do a MITM
attack

Hop-by-hop is TCP because its 1:1 on top of the hop-by-hop ACP secure channel,
no added value of TLS. Its only used for flooding service ("objective")
discovery
messages. End-to-end/peer2peer GRASP in ACP uses TLS.

You can not make service discovery more secure this this level, because the
worry
is not MITM, the problem is that every ACP node is equally trusted to announce
a service, and you have no prior knowledge that one node is providing a great
instance
of the service and the other one may look like its doing the same, but is e.g.:
re-selling
your data. At the ACP/ANI level we can just kick a bad acting node out by
certificate
revocation or expiry (short-level certs), pretty much like any other current
"secure soluton".

{46:f} > - *** section 6.10.3 use EITHER 00b or 0x00 but not both for the same
field

fixed.
Eliminated all b)inary values from addressing section.
Kept them in RPL section, where Pascal wrote them that way. guess they may be
commonly used here.

{47:a} > - *** section 6.10.2 and 6.10.3 write about scheme and subscheme but
they are not defined (or if they were, then it was pages ago)

I do not understand the concern:

6.10.2 defines the overall (base) scheme which includes sub-schemes,
explanations, table.
Then 6.10.3. ... 6.10.5 define the sub-schemes.

Maybe rephrase or propose a specific change ?

{48:f} > - section 6.10.3 only 15 bits for addressing ACP nodes ? It is only
32.000... not too many for IoT

A registrar can use multiple Registrar-IDs. Networks with distributed
Registrars will
typically have less than 15k nodes per registrar.

Fixed text here and in the V-Long addressing scheme to:

Registrar-ID (48-bit): A number unique inside the domain that identifies the
ACP registrar which assigned the Node-ID to the node. One or more domain-wide
unique identifiers of the ACP registrar can be used for this purpose. See <xref
target="registrars-unique"/>

{49:a} > - section 6.10.3.1 route aggregation is always a plus but does RPL
support route summarization ?

I am not aware, but i don't think so. This spec does
not define all the routing mechanism needed/desirable
for zone based route aggregation, it just carves out
the space. Like we have done in other parts of IPv6
address architecture.

What was tested with pre-standard implementation lab
testing was transitional for example. ACP RPL metro regions
(each with a zone) interconnected via a traditional MPLS/VPN
core where a configured VRF interconnects those metro regions
via the zone-prefix routes and ACP-connect.

Idea would be document such model if/when ACP networks
are start getting this problem and then think of way
to make those setups more autonomic.

{50:a} > - section 6.10.4 again the explanations on how IID is generated is
postponed to a later section, this is frustrating ;-)

Other away around could esaily be more frustrating.

The registrar section 6.10.7 section where this is explained
was written very late in the process so it could only be appended
at the end or , but even if it was written earlier, i think
the registrar description would be a lot harder to write
without first having defined the addressing:

Think about non-ACP network like enterprise network.
You would first explain the addressing plan of e.g.:
an enterprise network and later on describe how you
could build a system that generates the addressing.

{51:a} > - section 6.10 and addressing in general, I wonder whether such a
complexity is required to be specified in the normative section... Why not
state ULA and that's it ?

Think of the IPv6 address architecture, it also has standardized
addressing plan (whats unicast, scopes, multicast, ULA, etc.. pp)
to avoid having to do consisten per-hop configuration of address ranges
for different functions. If we wouldn't standardize, we would
need to provision a lot more addressing config parameters to
each node (all those implied by the addresses now, prefix-length,
zones, which registrar can assign which suffixes, which addresses
are inside ACP, which one is on ACP-connect interfaces).

Potentialy this would be so much that we would raise too many
eyeballs with security folks trying to put that all into the
certificates. And if it doesn't fit into certificates, we would
need another protocol beside BRSKI or non-crypto stuff in
BRSKI. And configure all distributed registrars. And even signal
addressing stuff via more routing protocol to check consistency,
set up edge-filtering for internal addresses, etc. pp.

Aka: huge simplification.

We think we get all the use cases we understand done by
ACP instances picking addresses from these options. But if
we ever figured out something can not be done, we're not stuck
as IPv6 overall is (AFAIK, most address space designed),
but we could simply flip a bit in the certificate ACP
domain information field in an incompatible way (e.g.: different
RFC prefix) and come up with a new addressing plan (or
different approach).

{52:f} > - *** section 6.10.7.2 please do not use MAC address as a source of 46
unique bits for the registrar... Virtual nodes do not always have unique MAC
addresses

I don't think its a good design rule to not exploit a feature that is
very beneficial in one domain just because its not applicable to another
domain. Especially when the first domain (physical) is really today the primary
domain against which the solution is designed, and the applicability
to the second domain (virtual) is today mostly theoretical and
has not been verified too much.

How about demanding physical routers must not have a power source
of their own (cable, batteries) because virtual routers do not
need them ? ;-))

Kidding aside. I have thought how to improve the text to address
technical clarity and generalization of the concept, here is whats
in the new revision:

<t>To support such unique address allocation, an ACP registrar MUST have
one or more 46-bit identifiers unique across the ACP domain which is called the
Registrar-ID. Allocation of Registrar-ID(s) to an ACP registrar can happen
through
OAM mechanisms in conjunction with some database / allocation orchestration.</t>

<t>ACP registrars running on physical devices with known globally unique
EUI-48 MAC address(es) can use the lower 46 bits of those address(es)
as unique Registrar-IDs without requiring any external signaling/configuration.
This approach is attractive for distributed, non-centrally administered,
lightweight
ACP registrar implementations. There is no mechanism to deduce from a MAC
address itself whether it is actually uniquely assigned. Implementations need
to consult additional offline information before making this assumption. For
example by knowing that a particular physical product/MIC-chip is
guaranteed to use globally unique assigned EUI-48 MAC address(es).</t>

Hope this solves the discuss.

{53:f} > - section 6.10.7.2 IdevID is used while in the terminology section it
was stated that they are not. Or did I read this wrongly?

changed in terminology:
"IDevID cannot be used for the ACP"
"IDevID cannot be used as a node identifier in the ACP"

changed in 6.10.7.3
"ACP registrars that can use the the IDevID of a candidate ACP device"
"ACP registrars that are aware of the IDevID of a candidate ACP device"

Aka: The ACP domain certificate is locally provided by the domains registrars,
like a membership card for some club/company/whatever, whereas the IDevID
could be seen as a primary node identifier like passport that could be used
as one of the authenticators when applying for the club/... membership.

{54:f}> - section 6.11.1.1 unsure whether using IPv6 HbH would be an issue as
ACP won't probably be HW accelerated, but, I do not mind to err on the safe side

Most of the original text of the RPL section was from Pascal, and was very
tersly
written, IMHO for RPL experts. I already tried to expand a lot of the
introduction for easier
digestion by non-RPL experts. I have added/modified the following text to make
the point you are referring to clearer:

6.11.1.1
This RPL profile avoids the use of Data-Plane artefacts
(RPL data packet headers, see <xref target="rpl-Data-Plane"/>), because
hardware accelerated forwarding planes most likely can not support them today.

6.11.1.13.
<section anchor="rpl-Data-Plane" title= "RPL Data-Plane artifacts">
<t>RPL Packet Information (RPI) defined in <xref target="RFC6550"/>,
section 11.2 defines the
data packet artefacts required or beneficial in forwarding of those
data packets when
their routing information is derived from RPL. This profile does not
use RPI for better
compatibility with accelerated hardeware forwarding planes and
achieves this for the following
reasons.</t>
<t>One RPI option is the RPL Source Routing Header (SRH) <xref
target="RFC6554"/> which is not
necessary in this profile because it uses storing mode where each
hop has the necessary next-hop
forwarding information.</t>
<t>The simpler RPL Option header <xref target="RFC6553"/> is also
not necessary in this profile, because it uses a single RPL instance
and data path
validation is also not used.</>
</section>

That text became so long because i felt without these explanations its
diffficult to
get on top of it as a non-RPL expert: RFC6550 that defines RPI does not even
define that
abbreviation, its only used from RFC8138 on, and the fact that RFC6553 and
RFC6554
are two different options for RPI is also something you will only figure out
after
having read a lot more of those RFCs.

{55:a} > - section 6.12.5 I agree that the term 'loopback interface' is
becoming really old-fashioned. Time to use another term in this document?

No change:

I don't think a 150 page document of a specific solution is a good place to
define new terminology to be reused generically. If we can scope a small
document to introduce such a better terminology, i am all for it.

[long opinions]
Another term could be a lot of political infights. It might be worth to
have that fight, but not in this doc: Logically i think its
what a Node SID is, except that the definition in RFC8402 section 3.2 is
fairly weak. And if you wanted to avoid dragging SR into this discussion,
(but instead OSI), it would be a node instead of a subnet address, but
then you're still not sure that the pragmatic folks working in OPS
have any interest in investing a lot in new terminology, given how
they have used loopback interface addresses forever as node identifiers
in IGPs and BGP, and AFAIK, nobody has bothered to bring up the
terminology discussion.

At best you avoided to explain how you do actually achieve a node address
in an existing IP stack (by using a loopback interface), so the word
"loopback address" didn't show up in the according RFCs. Then again,
that only works in the context of solutions where everybody already
understands how to implement. I would not want to expect that for the ACP.
[end]

(And remember, ANIMA is an OPS group, not an RTG group, so being more
practical shold earn brownie points ;-)

{56:f} > - section 6.12.5, perhaps it is only me, but, I had two burning
question marks in my head while reading this I-D: what about NBMA and what
about DAD... Answered now... The multi-access could be mentioned earlier though
as most of the text has an implicit P2P use case

Fixed. The main section "6.7. Security Association (Secure Channel) protocols "
didn't have any text (just subsections for IPsec etc), so i added the following
text into it:

<t>This section describes how ACP nodes establish secured data connections to
automatically discovered or configured peers in the ACP. <xref
target="discovery-grasp"/> above described how IPv6 subnet adjacent peers are
discovered automatically. <xref target="remote-neighbors"/> describes how non
IPv6 subnet adjacent peers can be configured.</t>

<t><xref target="ACP-virtual-interfaces"/> describes how secure channels are
mapped to virtual IPv6 subnet interfaces in the ACP. The simple case is to map
every ACP secure channel into a separate ACP point-to-point virtual interface
<xref target="ACP-p2p-virtual-interfaces"/>. When a single subnet has multiple
ACP peers this results in multiple ACP point-to-point virtual interfaces across
that underlying multi-party IPv6 subnet. This can be optimized with ACP
multi-access virtual interfaces <xref target="iACP-ma-virtual-interfaces"/> but
the benefits of that optimization may not justify the complexity of that
option.</t>

And 6.12.5 now has been structured into subsections to enable the new xref's.
Also added a note about non-considerations of multi-party secure associations
to 6.12.5.1 (GDOI).

{57:f} > - *** section 7.2 is it really " MLD snooping must be changed to never
forward packets" ? Suggest to use "ACP-aware L2 switch MUST never forward
packets for ALL_GRASP_NEIGHBORS"

Yes.

Unfortunately, i think you may have been the only reviewer who commented on this
L2 section, so when i tried to make it more precise, i also stumbled across
a bit of other non-ideal text, especially wrt. to VLANs (non-explicit
mentioning of
running GRASP only on untagged ports of VLANs), and in not being clear
how the described design is meant to enable ACP on L2/L3 switches without
actually changing any of the L2 forwarding plane except for the GRASP
message filtering. And of course i revisited this text because its
the reason for the following security concerns of your next point.

Text changes a bit longer in the section, read in rfcdiff.

{58:f} > - section 7.2 should the discussion about address stealing rather in
the security section ?

Yes. Small enough to go there. fixed (moved verbatim, no changes).

> - section 7.2 suggest the use of normative "SHOULD" and rewriting the
> sentence "Ideally, ACP peering should be built also across ports that are
> blocked in STP"

Fixed... Hope this does not frustrate HW where its difficult to implement.

{59:a} > - section 8.1.1 should the ACP edge also perform some duplicate
address detection ? E.g., if the NMS acp-address is already advertised in the
RPL ?

RPL actually does NOT help because its optimized to automatically reduce the
routing table. So unlike OSPF or ISIS, if you are not the root of the RPL tree
you would not see the prefixes routed towards the root, but effectively just
the equivalent of a "default" route to it.

With our new charter and documents like ACP out of the way, a
topology service ASA would be a good thing to define to help here,
e.g.: auto-negotiate manual addressing scheme prefixes between
all the ACP-connect edge routers. Shengs auto-addressing draft
draft hanging in IETF editor queue could be a basis for that.

{60:f} > - *** section 8.1.1 should the ACP edge also block all packets with
HbH or routing header?

<t>ACP Edge nodes SHOULD have a configurable option to filter packets with RPI
headers (xsee <xref target="rpl-Data-Plane"/> across an ACP connect interface.
These headers are outside the scope of the RPL profile in this specification
but may be used in future extensions of this specification.</t>

Note (again) that the document was beaten up in prior IESG review for its
references to "future", and reviewers claimed it was incomplete because of such
paragraphs. In result i removed all paragraphs with "future" in them to pass
IESG reviews. So, i would expect you to defend this new instance of "future" ;-)

(aka: if we didn't explain the future forward compatibility, we wouldn't have
an explanation why the filter would have to be configurable).

{61:f} > - section 8.1.2 sorry but cannot parse " The ACP connect mechanism be
only be used to connect physically"

Fixed:
The ACP connect mechanism can not only be used to connect physically external
systems

{62:f} > - section 8.1.2 possibly because of the above issue, but, I fail to
see what this section is all about? The section title is quite vague...

Added new first paragraph to 8.1.2:

<t>The previous section assumed that ACP Edge node and NOC devices are separate
physical devices and the ACP connect interface is a physical network
connection. This section discusses the implication when these components are
instead software components running on a single physical device.</t>

Hope this suffices.

Btw: The goal of this section is to primarily make the argument that
ACP-connect is a workaround when we talk about physical devices (too difficult
to build secure channel into NOC devices), but when everything is co-located as
software components on a single physical device and we trust the software
orchestration on the physical device, then ACP becomes the gold standard, and
cryptographically secure channels between those software coponents would just
be useless waste.

{63:a} > - section 8.1.3 is it worth to define a 3rd RPL profile for ACP edge
nodes?

IMHO: No

We already have 5 options within the RPL profile as specified in 6.11.1.12/14.

ACP-connect already implies the third highest priority to become RPL root, so
it already is distinguished.

If this is not satisfactory, pls. propose text or explain what functionality
you think is missing.

{64:a} > - *** section 8.1.5 is the ACP edge node really sending RA with PIO
for the ACP ULA prefix? Then NMS host will do plain SLAAC and can select an
address already in use (DAD will not work across a routed network). Please
state that the A bit for the prefix is not set in order to disable SLAAC. There
are also route options for RA that could be used.

I think you are talking about 8.1.3

8.1.3 says ACP edge routers use RFC4191 (RIO, not PIO, hence no bother about A
bit)
to announce a 'poison' default route of lifetime 0 because ACP edge routers as
defined
here do not want to see non-ACP traffic from the ACP-connect interface plus
actual (non-poisoned)
RIO routes for the actual ACP prefixes (ACP ULA prefixes).

RIOs AFAIK have no impact on SLAAC/DAD and are meant to indicate prefixes routed
across the announcing router and work for multi-homed hosts. NMS nodes would
need to be multi-homed with one ACP connect interface and a separate data-plane
interface).
Except for the "merged" case described in the ACP connect section.

{65:f} > - section 8.2 current title " ACP through Non-ACP L3 Clouds" is
confusing about the overloaded 'cloud'. What about "Connecting ACP islands over
NON-ACP L2 networks" ?

Fixed to:
Connecting ACP islands over Non-ACP L3 networks (Remote ACP neighbors)

(when we started, the other cloud was called "Data-Centers" ;-)

{66:f} > - *** section 8.2.1 please replace "DTLS" by "DTLSv12" in the
configuration to allow for DTLS 1.3

see discusses above for the DTLS logic enhancements.

The keywords in 8.2.1 are simply the same as in GRASP. "DTLS"
simply means "DTLS support down to version 2" (aka: DTLS v3 welcome,
but not mandatory unless we havee an RFC for it).

Btw: If we would later see the need to fully retire DTLS v2, we could
do a new profile "DTLS support down to version 3", and we would call
that "DTLsv3". Not saying DTLSv2 now, but only DTLS is the benefit
of this being the first profile.

{67:a} > - section 8.2.2 did you investigate whether Routing Header (a la MIPv6
or SRv6) could be used as well? Avoiding the double encapsulation

No change: I did investigate when you asked, here is my reply:

8.2.2 is a workaround for platforms that can not support 8.2.1 which is
IMHO always the most header efficient option.

I am not aware of any platforms that implements MIPv6 or SRv6 "VTI" (virtual
tunnel interfaces),
if there where such implementations, then the notion of the first paragraph of
this
section "or other form of pre-existing tunnel" would apply (aka: such tunnels
would be fine).

But: I don't think this would be any more header efficient than
the other explicitly mentioned encaps because AFAIK RFC8200 does not allow to
add these
headers without full IPv6 header encap.

{68:a} > - section 9.2.1 may be add NETCONF, RESTCONF ?

No need for the list to be exhaustive. Just long enough to justify the use-case.

The list only includes protocols where i am quite confident that i could
win the argument that the deployment realities (for lots of reasons) is
unencrypted, even if newer versions specify secure transport.

AFAIK, secure transport such as TLS or SSL are fairly widely used
for NETCONF, RESTCONF. Hence their explicit non-inclusion.

{69:f} > - should section 9.3 be normative ? Like being unable to disable ACP ?

Section 9. is really a summary ("Benefits") that was origially
at the end of the document. It does not and is not meant to introduce
any new requirements (normative or not) that are not specified
in the normative part.

At some stage of review i started to move all text that could not
be normative beyond all existing big text, which became section 10.
This was done to minimize unnecessary rfcdiff delta (avoid renumbering
of existing text). So it looks right now as if section 9 "introduces"
something new, in reality its a leftover bug of prior reorderin
(section 10 introduces it).

I will not check in the first version without change to section 9, which is
all technical changes for easier rfcdiff.

I will then check in a second revision that swaps section 10 and 9
making the "Benefits" Section at the end of the doc, also renaming
to "Summary: Benefits".

{70:a} > - section 10.1 also applies to many protocols and their
troubleshooting workflow... Unsure whether it belongs in this document ??? or
to another one to be created...

I am going to use my joker card for this point: ANIMA is OPS,
and therefore operational considerations should not be seen as
second class citizens like they often are in RTG/INT.

Of all the operational section 10 points, this is actually the first one because
i think it is the most important one. If we would have had more
of this diagnostics in pre-standard implementations, wew could have
saved weeks, if not months of troubleshooting with customers.

This was also weell received by other reviewers, and i have little
hope that i could write more about this in another document
better: Other sections such as 10.3 can easily translate into
later YANG documents. This section really requires more
experimentation by implementations first before we could dare
to convert it into YANG. So implementations really should start
with that experimentation in the first implementation. Or else
they'll run into a lot deployment/interop issues.

{71:a} > - section 10.2.3 should the security discussion be moved to the
security considerations section ?

Given my above opinion about what size of security considerations are better
localized in their appropriate context or into the general security
considerations,
i wold prefer to keep 10.2.3 inside of 10.2
>
{72:a} > - section A.3.1, I do not really buy your discussion about LLDP.
Section 7.2 could leverage or co-exist with LLDP

7.2 is about building ACP-enabled L2 switches.
This section A.3.1 is about co-existance with non-ACP-enabled L2 switches that
support CDP or LLDP. Aka: exactly not the switches considered in 7.2.

If the info we now have in GRASP was put as extensions into CDP or LLDP,
then non-ACP_enabled L2 swiches that do support non-extended CDP/LLDP messages
would just ignore those ACP extensions. And it would be a lot of trouble to
persuade IEEE to add LLDP options that are propagated across L2 switches.
I think tey are doing some of this now, but not generic, but for specific
use-cases that IEEE is interested in.

_______________________________________________
Anima mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/anima

[Anima] Eric/ANIMA: Re: AD review of draft-ietf-anima-autonomic-control-plane-21

Reply via email to