This is a review of: draft-ietf-anima-autonomic-control-plane-04. I will attempt to reply to parts of my review with more pertitent subject lines, because some substantive comments/discussion are embedded, or feel free to do that yourself.
Sorry that it's 400 lines long, if you'd like, I can find an XML file and
submit patches.
section 1:
s/access devices through console ports/
/access devices through console ports (craft ports)/
{
cf: http://tldp.org/HOWTO/Remote-Serial-Console-HOWTO/intro-why.html
There are many pages asking why the telco's call the console port the
"Craft" port, and it seems to have something to do with providing
access to the system to the "craftsmen personnel" to verify if the system
they just installed was operational.
}
change: For example, GRASP
[I-D.ietf-anima-grasp] can run inside the ACP.
to: For example, GRASP
[I-D.ietf-anima-grasp] can run securely inside the ACP.
section 3, can we number the requirements like in GRASP-08, i.e:
ACP1, ACP2, etc.
I think that this is confusing:
It may be necessary to have end-to-end connectivity in some cases,
for example to provide an end-to-end security association for some
protocols. This is possible, but then has a dependency on routable
address space.
I think that you mean to say that the ACP could run *OVER* some kind of
global end-to-end connectivity, but that then it depends upon routable
address space.
But, as I read it, it suggests that some protocols *inside* the ACP
might need end-to-end connectivity, and this would depend upon routable
address space. (My take is that the purpose of the ACP is to provide
end-to-end connectivity for protocols that run inside the ACP, and I think
we all agree about that)
section 4:
"Intent can override this default policy."
Instead of getting into what an Intent is, and confusing the security
reviewers, since we don't define it, can we just instead say:
"Unless overridden by some other policy, the default policy is: To
all adjacent nodes in the same domain. "
Can we number the steps in this section?
Please turn the following three points into numbered paragraphs or points
seperate from the previous "steps", since they are really notes:
o Non-autonomic NMS systems or controllers have to be manually
connected into the ACP.
o Connecting over non-autonomic Layer-3 clouds initially requires a
tunnel between autonomic nodes.
o None of the above operations (except manual ones) is reflected in
the configuration of the device.
Your diagram is great.
I have heard some say that they would want to enable the ACP on interfaces
which were marked Admin Down, with maybe even some kind of auto-negotiate
(or auto-guess based upon energy detection) of lambdas. Is it worth saying
something in section 4 about this?
5.1:
specific Unique Device Identifier (UDI) or IDevID certificate.
(Note: the UDI used in this document is NOT the UUID specified in
[RFC4122].)
how about telling us what the UDI is, rather than what it isn't?
Isn't this a Cisco internal term?
Is this here to steer your colleagues correctly? (I don't object to it being
there, I just want to make sure that it doesn't confuse others)
====
The domain certificate (LDevID) of an autonomic node MUST contain
ANIMA specific information, specifically the domain name, and its ACP
address with the zone-ID set to zero. This information MUST be
encoded in the LDevID in the subjectAltName / rfc822Name field in the
following way:
anima.acp+<ACP address>@<domain>
An example:
anima.acp+FD99:B02D:8EC3:0:200:0:6400:[email protected]
This puts some pretty clear and pretty strong requirements onto the
Registrar, which I think belongs in the bootstrap document. We don't really
have a place for this. I will start a new thread about this part.
5.1.2:
please move this elsewhere, as the table has not yet been defined, and you
are already making exceptions to it:
Where the next autonomic device is not directly adjacent, the
information in the adjacency table can be supplemented by
configuration. For example, the node-ID and IP address could be
configured.
This also seems like an pre-mature optimization:
The adjacency table MAY contain information about the validity and
trust of the adjacent autonomic node's certificate. However,
subsequent steps MUST always start with authenticating the peer.
In diagram Figure 2, please change "ANrtrI" to another letter, because
"I" and "1" are hard to distinguish.
It's true that a full mesh of ACP channels will be built: we ideally need to
create some metrix for RPL to use to pick parents. It would be desireable
to be aware of the L2 fabric.
You are, I think suggesting having the ANswitchX block forwarding of the
ALL_GRASP_NEIGHBOR mcast group. I'm not sure I like this solution.
One possible solution to the large number of channels is not to create the
IPsec CHILD SA unless needed. This would be possible if the IKEv2 deamon
was also the RPL routing daemon, and we did RPL over the IKEv2 layer.
Also really sick layer violations :-)
5.2.2:
Unfortunately, they [CDP? mDNS?] will also
terminate their messages if they do not support the ACP and would
then inhibit ACP neighbor discovery
Can you explain this? I don't understand what you are saying from the text.
Are you saying that an L2 switch that spoke LLDP, but didn't speak ACP
would eat the LLDP rather than forward it, and in this case, we want to
forward it. (A switch which didn't speak LLDP would just forward it)
It's good to point out that LLDP is not forwarded, and that L2 switches
already do this kind of thing when considering how to limit the ACP discovery.
It seems like the point of 5.2.2 is to discuss why we can't use mDNS.
Maybe we could split the CDP/LLDP section from the mDNS section. (Looks like
just a section header would do that)
5.2.3: we need to define it more clearly in this document, and if we want
to point elsewhere, we need to point to new anobjectives document.
5.2.4: I will write some text in coordination with the next update to BRSKY
to point at M_FLOOD.
XXX I feel it is important to combine the ACP discovery with the proxy
discovery.
Thanks for noticing richardson-anima-6join-discovery and pointing to
it. I think that there will be some changes to this too.
{so many documents, so little time}
5.3:
This is interesting, you are suggesting that while many nodes may
be part of the "example.com" domain, that the ACP would only be
established among some subset of it.
I can see how it might be important to connect CPE devices
(with "*.access.example.com" certificates)
a different ACP than the core routers (with "*.core.example.com")
One way is to run two instances of GRASP, and enroll each instance
seperately with different certificates. Another way might to give
the access concentrators certificates with multiple CNs.
A third way might be to create some kind of ACP proxy/tunnel
mechanism that permitted the CPE devices to build ACP tunnels
*through* the access concentrators, via the "core" ACP, to the
access network infrastructure.
I have another use for such a thing, which is providing ACP backhauls
in multi-tenant data centers. I will start an entirely new thread
on this.
I suggest third paragraph, "Intent can change.." be written:
This ACP document puts a requirement that Intents be able to
change this default behaviour. The precise way in which this
should be expressed needs to be defined outside this document.
Example Intent policies which need to be supported include:
5.4:
From the use-cases it is clear that not all type of autonomic devices
can or need to connect directly to each other or are able to support
or prefer all possible mechanisms. For example, code space limited
IoT devices may only support dTLS (because that code exists already
I claim that any "IoT" device that is "big enough" to participate meaningfully
in the ACP is also big enough to support the common protocols other than
DTLS. The ACP should be connected lighting controllers, not light bulbs.
As for MacSEC vs IPsec, it is my understanding that MacSEC does have a key
management protocol defined for it by the IEEE, so really the common
situation is that one supports IKEv2 to negotiate if one supports IPsec
or MacSEC.
As for the two stage process, I don't want to do this. I want to just
use IKEv2, and I claim that there will be no people who will say, "I can not
live with this". (Of course, some may have other preferences, but preferences
does not equal rough consensus)
...Alice must be able
to simultaneously act as a responder in parallel for all of them - so
that she can respond to any order in which Bob wants to prefer...
it's this part that I think is too complex and error prone to code.
5.5.1:
encryption. Further parameter options can be negotiated via IKEv2 or
via GRASP/TLS.
I think that the last sentence should be striked out, I think it is
meaningless. IKEv2 negotiates everything, and there is no GRASP/TLS.
5.5.2: ACP via GRE/IPsec
Given that you add GRE here, I don't understand 5.5.1.
Do you mean to write that 5.5.1 is really IPsec(transport-mode) IPIP(94)?
And 5.5.2 is really IPsec(transport-mode) GRE(47)?
Note that without explicit negotiation (eg: via GRASP/TLS), this
method is incompatible to direct ACP via IPsec, so it must only be
used as an option during GRASP/TLS negotiation.
That's not true. IKEv2 can negotiate this quite well. We may want to
define some Notify messages to make it abundantly clear that this is
an ACP negotiation going on, but that's easy.
5.5.3. ACP via dTLS
So, it's UDP and then... ? GRE inside UDP? (there is a draft
tsvwg-gre-in-udp-encap-19)
When Alice and Bob successfully establish the GRASP/TSL session, they
will initially negotiate the channel mechanism to use.
Yeah, no. Tons of code with no benefit.
Who is actually asking for these options?
5.5.5. ACP Security Profiles
A baseline autonomic device MUST support IPsec and SHOULD support
GRASP/TLS and dTLS. A constrained autonomic device MUST support
dTLS.
if we want to do something for constrained devices, then we should say that
they always initiate, that they should join as RPL leafs (so no forwarding of
packets), and that they the LWIG version of IKEv2 should be supported,
and maybe the diet-ESP mechanisms. We should also be clear if we are trying
to support constrained devices, constrained networks, or challenged networks.
5.7:
to:
If possible by the platform SW architecture,
separation options that minimize shared components are preferred.
add:
..such as a logical container (reference to Linux container), or
virtual machine instance (reference to KVM and also to the Cisco
router VM platform)
o Usage: Autonomic addresses are exclusively used for self-
management functions inside a trusted domain. They are not used
for user traffic. Communications with entities outside the
s/user/customer/
- whichever term we use, we may want to put this into the terminology
s/consensus was to use standard ULA, because it was deemed to be/
/consensus was to use ULA-random [RFC4193 with L=1], because it was deemed
to be/
as the first 40 bits of the MD5 hash of the domain name, in the
example "example.com".
we will get beat up for using MD5 by someone who uses grep, even as a PRF
here. Might as well just say SHA256, as it costs nothing here.
o Type: This field allows different address sub-schemes in the
In IANA Considerations, I suggest Standards Action, with 111 reserved for
private use.
I would like V to be at least three bits, maybe 8 bits.
In the bootstrap proxy IPIP mechanism, we need to allocate an ACP address
for each insecure L2-domain ("port") so that traffic from the Registrar
(which inside the IPIP header is v6LL) to get back to the correct link-layer.
I have mixed feelings about the 48-bit Registrar ID.
I know why you did it, and why you'd want to use 48-bits.
(It took two reads to realize it was the Registar's MAC, not the enrolled
node's MAC address).
So the diagram is really:
48 3 13 48 15 1
+-------------+-+--------+-------------+----------+---+
| hash(domain)|T| ZoneID | Registar ID |Device Num| V |
+-------------+-+--------+-------------+----------+---+
Since we never care about the /64 boundary in RPL, since we pass around
/128 routes in the ACP, do we care if we've placed the Registar ID
here? Clearly it is nice because we have ZoneID as a /64.
I'm thinking that I would like to instead do something like:
48 2 46 32 16
+-------------+-+------------+----------+-------------+
| hash(domain)|T| Registar ID|Device Num| V |
+-------------+-+------------+----------+-------------+
Where RegistarID is still MAC address derived, with the G and U
bits removed, and we now have 2^32 space for devices (I think 2^16
might be too small if one includes CPE devices in the CPE, and
one has some churn over a decade+ of CPE devices).
There is now 16 bits available to do things, and we can pass /114
routes around in RPL, btw. I'm open as to whether V remains
as a specified bit, or if "physical" machine is just V=0x0000.
If we need ZoneID, then I suggest that we can easily get it by
having different Registrar IDs for each zone. If you want them in different
/64s then just construct the RegistarID to be unique in the upper 14 bits.
This brings up an important aspect, which I know we have discussed before,
which is what does the certificate say, and how does it relate to IPsec
SA permissions, and therefore to ability for GRASP to trust things.
XXX I need to write something here to make it clearer that the ACP
isn't so squisshy in the middle...
5.8.4:
If a device learns through an autonomic method or through
configuration that it is part of a zone, it MUST also respond to its
ACP address with that zone number. In this case the ACP loopback is
I don't like this, because it seriously breaks up the aggregation that
might otherwise be possible. I don't want explicit ZoneID, I'd rather
go with the Note: in 5.8.4, or use the RegistrarID.
5.9:
Needs to say more clearly that we are using 6550 RPL,
and we need to decide if we are using storing or non-storing mode.
I strongly suggest that we want storing mode.
We will have to define a bunch of other RPL parameters.
We also need to be clear that RPL is occuring *within* the ACP
channels.
(Alternatively, we could run RPL outside the ACP channels, using RPL
layer-3 security, and then setup ACP channels when we pick a parent.
There are definite advantages to this, and also many downsides.
I don't suggest this, but, it might be worth saying why)
5.10:
When we establish multiple ACP channels RPL (or any other routing
protocol!) will need to have some metrics to pick among them.
I'm not sure what we can provide here, at the least, we should
prefer shorter paths to longer ones.
If an autonomic node decides to have a limit on how many channels
it sets up, or how many it will setup with a particular peer,
it SHOULD indicate a clear "thanks, I'm full" message in the ACP
channel negotiation protocol (i.e. an IKEv2 Notification).
6.1:
Is there a distinction between marking a port on a switch as
"ACP access" (no ACP channel) to connect to the NMS, from
a case where the switch is told to negotiate an ACP channel
with the NMS machines (extending the ACP via explicit configuration,
rather than via discovery)?
I think so, does 6.1 cover only the "ACP access" case then?
Can we give it a clear name?
I'd like to add a 6.3:
ACP through third-party L3 Clouds
I'm thinking that a cooperating L3 device could M_FLOOD ACP, and then,
when the IKEv2 negotiation comes in, could have been configured to forward
the traffic back to a designated ACP node at the edge of the NMS.
(maybe over IPv4 including NATs).
The resulting tunnel would be an ESP-over-UDP tunnel.
A multi-tenant datacenter might provide this as a service to it's tenants.
(where would the bandwidth come from? The datacenter would probably
buy that from a it's transit tenants and be multihomed)
7.
o If an existing device gets revoked, it will automatically be
denied access to the ACP as its domain certificate will be
validated against a Certificate Revocation List during
authentication. Since the revocation check is only done at the
First mention of CRLs, btw! This is one of the details that belong
in section 5.5.1/5.5.2.
automatically torn down. If an immediate disconnect is required,
existing sessions to a freshly revoked device can be re-set.
the problem is that the knowledge to know to re-set is not distributed,
unless we do it via GRASP. The detail missing is that we should be
restarting the IKEv2 Parent SAs periodically (we can do this without killing
the Child SAs), and doing the OCSP checks there.
I suggest that OCSP is probably the better solution rather than CRLs here as
we have an ACP over which to do it. Max may have an opinion here, and maybe
we should do CRLs instead for reasons of network partition.
There are few central dependencies: A certificate revocation list
(CRL) may not be available during a network partition; a suitable
policy to not immediately disconnect neighbors when no CRL is
available can address this issue.
Assuming that "immediately" means that we eventually disconnect neighbours
when no CRL is available, isn't that the same as just making the CRL recheck
time longer?
i.e. CRL check time of X + grace period Y
same as: CRL check time = X+Y
rekey time = X
section 10, needs a discussion about source address spoofing within the ACP.
Appendix A:
of the network, the less state needs to be maintained. This
adapts nicely to the typical network design. Also, all changes
below a common parent node are kept below that parent node.
this implies that we are using storing mode. It's not true for non-storing
mode.
--
Michael Richardson <[email protected]>, Sandelman Software Works
-= IPv6 IoT consulting =-
signature.asc
Description: PGP signature
_______________________________________________ Anima mailing list [email protected] https://www.ietf.org/mailman/listinfo/anima
