Interdiff (including some formatting since doclint was complaining):

diff --git a/doc/design-candidates.rst b/doc/design-candidates.rst
index 1da5b2e..b856f4b 100644
--- a/doc/design-candidates.rst
+++ b/doc/design-candidates.rst
@@ -1,6 +1,6 @@
-================================================
-Improvements regarding Master Candidate Security
-================================================
+=============================
+Improvements of Node Security
+=============================

 This document describes an enhancement of Ganeti's security by restricting
 the distribution of security-sensitive data to the master and master
@@ -14,10 +14,10 @@ is neither master nor master-candidate.
 Objective
 =========

-Up till 2.10, Ganeti distributed security-relevant keys to all nodes,
+Up till 2.10, Ganeti distributes security-relevant keys to all nodes,
 including nodes that are neither master nor master-candidates. Those
 keys are the private and public SSH keys for node communication and the
-SSL certficiate and private key for RPC communication. Objective of this
+SSL certficate and private key for RPC communication. Objective of this
 design is to limit the set of nodes that can establish ssh and RPC
 connections to the master and master candidates.

@@ -66,6 +66,12 @@ in the virtualization stacks to gain access to the host
machines as well.
 Proposal concerning SSH key distribution
 ----------------------------------------

+We propose two improvements regarding the ssh keys:
+
+#. Limit the distribution of the private ssh key to the master candidates.
+
+#. Use different ssh key pairs for each master candidate.
+
 We propose to limit the set of nodes holding the private root user SSH key
 to the master and the master candidates. This way, the security risk would
 be limited to a rather small set of nodes even though the cluster could
@@ -75,9 +81,25 @@ security even more if the administrator wishes so. The
following
 sections describe in detail which Ganeti commands are affected by this
 change and in what way.

+Security will be even more increased if each master candidate gets
+its own ssh private/public key pair. This way, one can remove a
+compromised master candidate from a cluster (including removing it's
+public key from all nodes' ``authorized_keys`` file) without having to
+regenerate and distribute new ssh keys for all master candidates. (Even
+though it is be good practice to do that anyway, since the compromising
+of the other master candidates might have taken place already.) However,
+this improvement was not part of the original feature request and
+increases the complexity of node management even more. We therefore
+consider it as second step in this design and will address
+this after the other parts of this design are implemented.
+
+The following sections describe in detail which Ganeti commands are
affected
+by the first part of ssh-related improvements, limiting the key
+distribution to master candidates only.
+

 (Re-)Adding nodes to a cluster
-~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 According to ``design-node-add.rst``, Ganeti transfers the ssh keys to
every
 node that gets added to the cluster.
@@ -139,6 +161,10 @@ This affects the command:

 For offlining, the removal of the keys is particularly important, as the
 detection of a compromised node might be the very reason for the offlining.
+Of course we cannot guarantee that removal of the key is always successful,
+because the node might not be reachable anymore. Even though it is a
+best-effort operation, it is still an improvement over the status quo,
+because currently Ganeti does not even try to remove any keys.

 The same behavior should be ensured for the corresponding rapi command.

@@ -181,12 +207,16 @@ in the design.
 This design has the following advantages:

 - A compromised normal node cannot issue RPC calls, because it will
-  not be in the candidate map.
+  not be in the candidate map. (See the ``Drawbacks`` section regarding
+  an indirect way of achieving this though.)
 - A compromised master candidate would be able to issue RPC requests,
   but on detection of its compromised state, it can be removed from the
   cluster (and thus from the candidate map) without the need for
   redistribution of any certificates, because the other master candidates
-  can continue using their own certificates.
+  can continue using their own certificates. However, it is best
+  practice to issue a complete key renewal even in this case, unless one
+  can ensure no actions compromising other nodes have not already been
+  carried out.
 - A compromised node would not be able to use the other (possibly master
   candidate) nodes' information from the candidate map to issue RPCs,
   because the config just stores the digests and not the certificate
@@ -207,6 +237,13 @@ Drawbacks of this design:
   from the Ganeti cluster already. However, this is still a better
   situation than before and an inherent problem when one wants to
   distinguish between master candidates and normal nodes.
+- A compromised master candidate would still be able to issue RPC calls,
+  if it uses ssh to retrieve another master candidate's client
+  certificate and the corresponding private SSL key. This is an issue
+  even with the first part of the improved handling of ssh keys in this
+  design (limiting ssh keys to master candidates), but it will be
+  eliminated with the second part of the design (separate ssh keys for
+  each master candidate).

 Alternative proposals:

@@ -244,7 +281,7 @@ written to ssconf.


 (Re-)Adding nodes
-~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~

 When a node is added, the server certificate is copied to the node (as
 before). Additionally, a new client certificate (and the corresponding
@@ -335,12 +372,13 @@ Besides that, the watcher does not make any ssh
connections, and thus is
 not affected by the changes in ssh key handling either.


-Other Keys
-~~~~~~~~~~
+Other Keys and Daemons
+~~~~~~~~~~~~~~~~~~~~~~

 Ganeti handles a couple of other keys/certificates that have not been
mentioned
-in this design so far. They will not be affected by this design for several
-reasons:
+in this design so far. Also, other daemons than the ones mentioned so far
+perform intra-cluster communication. Neither the keys nor the daemons will
+be affected by this design for several reasons:

 - The hmac key used by ConfD (see ``design-2.1.rst``): the hmac key is
still
   distributed to all nodes, because it was designed to be used for
@@ -350,7 +388,9 @@ reasons:
   is read-only, a compromised node holding the hmac key does not enable an
   attacker to change the cluster's state.

-  (TODO: what about WConfD?)
+- The WConfD daemon writes the configuration to all master candidates
+  via RPC. Since it only runs on the master node, it's ability to run
+  RPC requests is maintained with this design.

 - The rapi SSL key certificate and rapi user/password file 'rapi_users' is
   already only copied to the master candidates (see ``design-2.1.rst``,


On Wed, Dec 4, 2013 at 5:11 PM, Helga Velroyen <[email protected]> wrote:

> Hi!
>
> thanks for your comments.
>
>
> On Wed, Dec 4, 2013 at 4:51 PM, Klaus Aehlig <[email protected]> wrote:
>
>>
>>
>> I have a few generel comments about this design. It spends quite
>> some effort to restrict the impact of a compromised master candidate
>> via RPC, but leaves it with the private root key. Wouldn't a similar
>> design (each node has its private key, and there is a list of
>> authorized keys) help here as well? This would also have advantage
>> of never having to rely on being able to delete a key from node,
>> in particular if it is offlined to be sent to repair.
>>
>
> I agree with your point that a compromised master candidate could ssh into
> another master candidate to fetch that nodes' SSL certificate and private
> key and then issue RPC commands.
>
> A bit more background here:
> - To increase RPC security changing the procedure by generating different
> client and server certificates and different client certificates for master
> candidates and normal nodes was anyway necessary, and generating a separate
> client certificate for each node did not add much more to the complexity
> and effort of this design. This way the increased security regarding
> compromised master candidates was just a (cheap and obviously by far not
> complete) addon and not even in the scope of the original proposal (which
> talked mostly about compromised normal nodes).
> - Reducing the ssh keys to master candidates is the first (and cheapest)
> solution that fixes the demands of the issue 377.
>
> I acknowledge that if security of compromised master candidate nodes
> becomes a higher priority as well (thus going beyond issue 377), it would
> be desirable to have different ssh pairs for each node as well. I suggest
> to divide the ssh part of this design in two parts, the first being the one
> that is already included and an additional part that proposes one ssh key
> pair per node. I'll send an interdiff soon.
>
> Also, I think I should rename the design doc title to mention 'node
> security' instead of 'master candidate security' since its primary goal was
> actually more the security of all nodes and not only master candidates.
>
>
>>
>> Thanks,
>> Klaus
>>
>> > +Objective
>> > +=========
>> > +
>> > +Up till 2.10, Ganeti distributed security-relevant keys to all nodes,
>>
>> s/distributed/distributes/
>>
>> You're proposing a new change; it's not implemented yet.
>>
>
> ACK
>
>
>>
>> > +For offlining, the removal of the keys is particularly important, as
>> the
>> > +detection of a compromised node might be the very reason for the
>> offlining.
>>
>> Can we rely on the removal working here? What about offlining nodes
>> because
>> we cannot reach them (network, power supply, etc)?
>>
>
> No, we cannot rely on removal here, but as with many things in Ganeti we
> try our best as long as it is possible to reach the node. I will however,
> add this as a remark to the design doc as we cannot promise this to succeed.
>
>
>>
>> > +- A compromised master candidate would be able to issue RPC requests,
>> > +  but on detection of its compromised state, it can be removed from the
>> > +  cluster (and thus from the candidate map) without the need for
>> > +  redistribution of any certificates, because the other master
>> candidates
>> > +  can continue using their own certificates.
>>
>> I'm not too convinced about the discussion of a compromised master
>> candidate,
>> as it still owns the private root key, and hence can steal the client
>> certificates
>> from the other master candidates.
>
>
> See my comments above. I'll send an interdiff soon.
>
>
>> Moreover, upon detection of the compromise we
>> have to assume that it has already done so, so a full renewal of all
>> credentials
>> is necessary anyway.
>>
>
> I agree that any sane administrator would issue a full renewal of all
> credentials anyway, although I was pointing out the increased likelihood if
> it being necessary (in contrast to the status quo). I will also add a
> remark here to the design doc to make that point clear.
>
>
>>
>> > +  (TODO: what about WConfD?)
>>
>> WConfD will accept requests only via unix domain sockets, so no external
>> exposure.
>> Distribution of the changed configuration is done via RPC, so the above
>> RPC discussion
>> applies.
>>
>
> Okay, I'll add a paragraph here to make it explicit that the WConfD design
> does not conflict with this design.
>
> Cheers,
> Helga
>



-- 
-- 
Helga Velroyen | Software Engineer | [email protected] |

Google Germany GmbH
Dienerstr. 12
80331 München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores

Reply via email to