On Wed, 04 Sep 2013, Dmitri Pal wrote:
On 09/04/2013 09:08 AM, Dmitri Pal wrote:
On 09/03/2013 04:01 PM, Simo Sorce wrote:
On Tue, 2013-09-03 at 12:36 -0400, Dmitri Pal wrote:
On 09/02/2013 09:42 AM, Petr Spacek wrote:
On 27.8.2013 23:08, Dmitri Pal wrote:
On 08/27/2013 03:05 PM, Rob Crittenden wrote:
Dmitri Pal wrote:
On 08/09/2013 08:30 AM, Petr Spacek wrote:
Hello,

I would like to get opinions about key maintenance for DNSSEC.

Problem summary:
- FreeIPA will support DNSSEC
- DNSSEC deployment requires <2,n> cryptographic keys for each DNS
zone (i.e. objects in LDAP)
- The same keys are shared by all FreeIPA servers
- Keys have limited lifetime and have to be re-generated on monthly
basics (in very first approximation, it will be configurable and the
interval will differ for different key types)
- The plan is to store keys in LDAP and let 'something' (i.e.
certmonger or oddjob?) to generate and store the new keys back into
LDAP
- There are command line tools for key-generation (dnssec-keygen from
the package bind-utils)
- We plan to select one super-master which will handle regular
key-regeneration (i.e. do the same as we do for special CA
certificates)
- Keys stored in LDAP will be encrypted somehow, most probably by
some
symmetric key shared among all IPA DNS servers

Could certmonger or oddjob do key maintenance for us? I can imagine
something like this:
- watch some attributes in LDAP and wait until some key expires
- run dnssec-keygen utility
- read resulting keys and encrypt them with given 'master key'
- store resulting blobs in LDAP
- wait until another key reaches expiration timestamp

It is simplified, because there will be multiple keys with different
lifetimes, but the idea is the same. All the gory details are in the
thread '[Freeipa-devel] DNSSEC support design considerations: key
material handling':
https://www.redhat.com/archives/freeipa-devel/2013-July/msg00129.html
https://www.redhat.com/archives/freeipa-devel/2013-August/msg00086.html


Nalin and others, what do you think? Is certmonger or oddjob the
right
place to do something like this?

Thank you for your time!

Was there any discussion of this mail?

I think at least some of this was covered in another thread, "DNSSEC
support design considerations: key material handling" at
https://www.redhat.com/archives/freeipa-devel/2013-August/msg00086.html

rob


Yes, I have found that thread though I have not found it to come to some
conclusion and a firm plan.
I will leave to Petr to summarize outstanding issues and repost them.
All questions stated in the first e-mail in this thread are still open:
https://www.redhat.com/archives/freeipa-devel/2013-August/msg00089.html

There was no reply to these questions during my vacation, so I don't
have much to add at the moment.

Nalin, please, could you provide your opinion?
How modular/extendible the certmonger is?
Does it make sense to add DNSSEC key-management to certmonger?
What about CA rotation problem? Can we share some algorithms (e.g. for
super-master election) between CA rotation and DNSSEC key rotation
mechanisms?

BTW I like the idea of masters being responsible for generating a subset
of the keys as Loris suggested.
E-mail from Loris in archives:
https://www.redhat.com/archives/freeipa-devel/2013-August/msg00100.html

The idea seems really nice and simple, but I'm afraid that there could
be some serious race conditions.

- How will it work when topology changes?
- What if number of masters is > number of days in month? (=>
Auto-tune interval from month to smaller time period => Again, what
should we do after a topology change?)
- What we should do if topology was changed when a master was
disconnected from the rest of the network? (I.e. Link over WAN was
down at the moment of change.) What will happen after re-connection to
the topology?

Example:
Time 0: Masters A, B; topology:  A---B
Time 1: Master A have lost connection to master B
Time 2: Master C was added; topology:  A ยง B---C
Time 3 (Day 3): A + C did rotation at the same time
Time 4: Connection was restored;  topology: A---B---C

Now what?


I have a feeling that we need something like quorum protocol for
writes (only for sensitive operations like CA cert and DNSSEC key
rotations).

http://en.wikipedia.org/wiki/Quorum_(distributed_computing)


The other question is how should we handle catastrophic situations
where more than half of masters were lost? (Two of three data centres
were blown by a tornado etc.)

It becomes more and more obvious that there is no simple solution that
we can use out of box.
Let us start with a single nominated server. If the server is lost the
key rotation responsibility can be moved to some other server manually.
Not optimal but at least the first step.

The next step would be to be able to define alternative (failover)
servers. Here is an example.
Let us say we have masters A, B, C. In topology A - B - C.
Master A is responsible for the key rotation B is the fail-over.
The key rotation time would be in some way recorded in the replication
agreement(s) between A & B.
If at the moment of the scheduled rotation A <-> B connection is not
present A would skip rotation and B would start rotation. If A comes
back and connects to B (or connection is just restored) the replication
will update the keys on A. If A is lost the keys are taken care of by B
for itself and C.
There will be a short window of race condition but IMO it can be
mitigated. If A clock is behind B then if A managed to connect to B it
would notice that B already started rotation. If B clock is behind and A
connects to B before B started rotation A has to perform rotation still
(sort of just made it case).

Later if we want more complexity we can define subsets of the keys to
renew and assign them to different replicas and then define failover
servers per set.
But this is all complexity we can add later when we see the real
problems with the single server approach.
Actually I thought about this for a while, and I think I have an idea
about how to handle this for DNSSEC, (may not apply to other cases like
CA).

IIRC keys are generate well in advance from the time they are used and
old keys and new keys are used side by side for a while, until old keys
are finally expired and only new keys are around.

This iso regulated by a series of date attributes that determine when
keys are in used when they expire and so on.

Now the idea I have is to add yet another step.

Assume we have key "generation 1" (G1) in use and we approach the time
generation 1 will expire and generation 2 (G2) is needed, and G2 is
created X months in advance and all stuff is signed with both G1 and G2
for a period.

Now if we have a pre-G2 period we can have a period of time when we can
let multiple servers try to generate the G2 series, say 1 month in
advance of the time they would normally be used to start signing
anything. Then only after that 1 month they are actually put into
services.

How does this helps? Well it helps in that even if multiple servers
generate keys and we have duplicates they have all the time to see that
there are duplicates (because 2 server raced).
now if e can keep a subsecond 'creation' timestamp for the new keys when
replication goes around all servers can check and use only the set of
keys that have been create first, and the servers that created the set
of keys that lose the race will just remove the duplicates.
given we have 1 month of time between the creation and the actual time
keys will be used we have all the time to let servers sort out whether
there are keys available or not and prune out duplicates.

A diagram in case I have not been clear enough


Assume servers A, B, C they all randomize (within a week) the time at
which they will attempt to create new keys if it is time to and none are
available already.

Say the time come to create G2, A, B ,C each throw a dice and it turns
out A will do it in 35000 seconds, B will do it in 40000 seconds, and C
in 32000 seconds, so C should do it first and there should be enough
time for the others to see that new keys popped up and just discard
their attempts.

However is A or C are temporarily disconnected they may still end up
generating new keys, so we have G2-A and G2-B, once they get reconnected
and replication flows again all servers see that instead of a single G2
set we have 2 G2 sets available
G2-A created at timestamp X+35000 and G2-B created at timestamp X+32000,
so all servers know they should ignore G2-A, and they all ignore it.
When A comes around to realize this itself it will just go and delete
the G2-A set. Only G2-B set is left and that is what will be the final
official G2.

If we give a week of time for this operation to go on I think it will be
easy to resolve any race or temporary diconnection that may happen.
Also because all server can attempt (within that week) to create keys
there is no real single point of failure.

HTH,
please poke holes in my reasoning :)

Simo.

Reasonable just have couple comments.
If there are many keys and many replicas the chance would be that there
will be a lot of load. Generating keys is costly computation wise.
Replication is costly too.
Also you assume that topology works fine. I am mostly concerned about
the case when some replication is not working and data from one part of
the topology is not replicated to another. The concern is that people
would not notice that things are not replicating. So if there is a
problem and we let all these key to be generated all over the place it
would be pretty hard to untie this knot later.

I would actually suggest that if a replica X needs the keys in a month
from moment A and the keys have not arrived in 3 first days after moment
A and this replica is not entitled to generate keys it should start
sending messages to admin. That way there will be enough time for admin
to sort out what is wrong and nominate another replica to generate the
keys if needed. There should be command as simple as:

ipa dnssec-keymanager-set <replica>

that would make the mentioned replica the key generator.
There can be other commands like

ipa dnssec-keymanager-info

Appointed server: <server>
Keys store: <path>
Last time keys generated: <some time>
Next time keys need to be generated: <...>
...




IMO in this case we need to help admin to see that there is a problem
and provide tools to easily mitigate it rather than try to solve it
ourselves and build a complex algorythm.

Thinking even more about this.
May be we should start with the command that would be something like:

ipa health

This command would detect the topology, try to connect to all replicas
check that they are all up and running, replicating, nothing is stuck
and report any issues.
The output of the command can be sent somewhere or as a mail to admin.

Then it can be run periodically as a part of cron on couple servers and
if there is any problem admin would know quite soon.
Then admin would know things like:
1) The CRL generating server is down/unreachable
2) The DNSSEC key generating server is down/unreachable
3) Some CAs are unreachable
4) The server that rotates certificates is down/unreachable
5) The server that does AD sync is down/unreachable

There might be other things.
IMO we have enough sinlge point of failure services already. Adding
DNSSEC key generation to that set is not a big deal but the utility like
this would really go a long way making IPA more usable, manageable and
useful.

Should I file an RFE?
The tool you describe above should be able to perform operations on the master.
it is in general better not to put master-specific operations into a
client tool that could be run from an arbitrary host where ipa admin
tools are installed.

What about plugging the functionality into ipa-advise?

  ipa-advise health-check-{cert|replication|dnssec|...}


--
/ Alexander Bokovoy

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to