On Wed, 2013-09-04 at 09:08 -0400, Dmitri Pal wrote: > On 09/03/2013 04:01 PM, Simo Sorce wrote: > > On Tue, 2013-09-03 at 12:36 -0400, Dmitri Pal wrote: > >> On 09/02/2013 09:42 AM, Petr Spacek wrote: > >>> On 27.8.2013 23:08, Dmitri Pal wrote: > >>>> On 08/27/2013 03:05 PM, Rob Crittenden wrote: > >>>>> Dmitri Pal wrote: > >>>>>> On 08/09/2013 08:30 AM, Petr Spacek wrote: > >>>>>>> Hello, > >>>>>>> > >>>>>>> I would like to get opinions about key maintenance for DNSSEC. > >>>>>>> > >>>>>>> Problem summary: > >>>>>>> - FreeIPA will support DNSSEC > >>>>>>> - DNSSEC deployment requires <2,n> cryptographic keys for each DNS > >>>>>>> zone (i.e. objects in LDAP) > >>>>>>> - The same keys are shared by all FreeIPA servers > >>>>>>> - Keys have limited lifetime and have to be re-generated on monthly > >>>>>>> basics (in very first approximation, it will be configurable and the > >>>>>>> interval will differ for different key types) > >>>>>>> - The plan is to store keys in LDAP and let 'something' (i.e. > >>>>>>> certmonger or oddjob?) to generate and store the new keys back into > >>>>>>> LDAP > >>>>>>> - There are command line tools for key-generation (dnssec-keygen from > >>>>>>> the package bind-utils) > >>>>>>> - We plan to select one super-master which will handle regular > >>>>>>> key-regeneration (i.e. do the same as we do for special CA > >>>>>>> certificates) > >>>>>>> - Keys stored in LDAP will be encrypted somehow, most probably by > >>>>>>> some > >>>>>>> symmetric key shared among all IPA DNS servers > >>>>>>> > >>>>>>> Could certmonger or oddjob do key maintenance for us? I can imagine > >>>>>>> something like this: > >>>>>>> - watch some attributes in LDAP and wait until some key expires > >>>>>>> - run dnssec-keygen utility > >>>>>>> - read resulting keys and encrypt them with given 'master key' > >>>>>>> - store resulting blobs in LDAP > >>>>>>> - wait until another key reaches expiration timestamp > >>>>>>> > >>>>>>> It is simplified, because there will be multiple keys with different > >>>>>>> lifetimes, but the idea is the same. All the gory details are in the > >>>>>>> thread '[Freeipa-devel] DNSSEC support design considerations: key > >>>>>>> material handling': > >>>>>>> https://www.redhat.com/archives/freeipa-devel/2013-July/msg00129.html > >>>>>>> https://www.redhat.com/archives/freeipa-devel/2013-August/msg00086.html > >>>>>>> > >>>>>>> > >>>>>>> Nalin and others, what do you think? Is certmonger or oddjob the > >>>>>>> right > >>>>>>> place to do something like this? > >>>>>>> > >>>>>>> Thank you for your time! > >>>>>>> > >>>>>> Was there any discussion of this mail? > >>>>>> > >>>>> I think at least some of this was covered in another thread, "DNSSEC > >>>>> support design considerations: key material handling" at > >>>>> https://www.redhat.com/archives/freeipa-devel/2013-August/msg00086.html > >>>>> > >>>>> rob > >>>>> > >>>>> > >>>> Yes, I have found that thread though I have not found it to come to some > >>>> conclusion and a firm plan. > >>>> I will leave to Petr to summarize outstanding issues and repost them. > >>> All questions stated in the first e-mail in this thread are still open: > >>> https://www.redhat.com/archives/freeipa-devel/2013-August/msg00089.html > >>> > >>> There was no reply to these questions during my vacation, so I don't > >>> have much to add at the moment. > >>> > >>> Nalin, please, could you provide your opinion? > >>> How modular/extendible the certmonger is? > >>> Does it make sense to add DNSSEC key-management to certmonger? > >>> What about CA rotation problem? Can we share some algorithms (e.g. for > >>> super-master election) between CA rotation and DNSSEC key rotation > >>> mechanisms? > >>> > >>>> BTW I like the idea of masters being responsible for generating a subset > >>>> of the keys as Loris suggested. > >>> E-mail from Loris in archives: > >>> https://www.redhat.com/archives/freeipa-devel/2013-August/msg00100.html > >>> > >>> The idea seems really nice and simple, but I'm afraid that there could > >>> be some serious race conditions. > >>> > >>> - How will it work when topology changes? > >>> - What if number of masters is > number of days in month? (=> > >>> Auto-tune interval from month to smaller time period => Again, what > >>> should we do after a topology change?) > >>> - What we should do if topology was changed when a master was > >>> disconnected from the rest of the network? (I.e. Link over WAN was > >>> down at the moment of change.) What will happen after re-connection to > >>> the topology? > >>> > >>> Example: > >>> Time 0: Masters A, B; topology: A---B > >>> Time 1: Master A have lost connection to master B > >>> Time 2: Master C was added; topology: A ยง B---C > >>> Time 3 (Day 3): A + C did rotation at the same time > >>> Time 4: Connection was restored; topology: A---B---C > >>> > >>> Now what? > >>> > >>> > >>> I have a feeling that we need something like quorum protocol for > >>> writes (only for sensitive operations like CA cert and DNSSEC key > >>> rotations). > >>> > >>> http://en.wikipedia.org/wiki/Quorum_(distributed_computing) > >>> > >>> > >>> The other question is how should we handle catastrophic situations > >>> where more than half of masters were lost? (Two of three data centres > >>> were blown by a tornado etc.) > >>> > >> It becomes more and more obvious that there is no simple solution that > >> we can use out of box. > >> Let us start with a single nominated server. If the server is lost the > >> key rotation responsibility can be moved to some other server manually. > >> Not optimal but at least the first step. > >> > >> The next step would be to be able to define alternative (failover) > >> servers. Here is an example. > >> Let us say we have masters A, B, C. In topology A - B - C. > >> Master A is responsible for the key rotation B is the fail-over. > >> The key rotation time would be in some way recorded in the replication > >> agreement(s) between A & B. > >> If at the moment of the scheduled rotation A <-> B connection is not > >> present A would skip rotation and B would start rotation. If A comes > >> back and connects to B (or connection is just restored) the replication > >> will update the keys on A. If A is lost the keys are taken care of by B > >> for itself and C. > >> There will be a short window of race condition but IMO it can be > >> mitigated. If A clock is behind B then if A managed to connect to B it > >> would notice that B already started rotation. If B clock is behind and A > >> connects to B before B started rotation A has to perform rotation still > >> (sort of just made it case). > >> > >> Later if we want more complexity we can define subsets of the keys to > >> renew and assign them to different replicas and then define failover > >> servers per set. > >> But this is all complexity we can add later when we see the real > >> problems with the single server approach. > > Actually I thought about this for a while, and I think I have an idea > > about how to handle this for DNSSEC, (may not apply to other cases like > > CA). > > > > IIRC keys are generate well in advance from the time they are used and > > old keys and new keys are used side by side for a while, until old keys > > are finally expired and only new keys are around. > > > > This iso regulated by a series of date attributes that determine when > > keys are in used when they expire and so on. > > > > Now the idea I have is to add yet another step. > > > > Assume we have key "generation 1" (G1) in use and we approach the time > > generation 1 will expire and generation 2 (G2) is needed, and G2 is > > created X months in advance and all stuff is signed with both G1 and G2 > > for a period. > > > > Now if we have a pre-G2 period we can have a period of time when we can > > let multiple servers try to generate the G2 series, say 1 month in > > advance of the time they would normally be used to start signing > > anything. Then only after that 1 month they are actually put into > > services. > > > > How does this helps? Well it helps in that even if multiple servers > > generate keys and we have duplicates they have all the time to see that > > there are duplicates (because 2 server raced). > > now if e can keep a subsecond 'creation' timestamp for the new keys when > > replication goes around all servers can check and use only the set of > > keys that have been create first, and the servers that created the set > > of keys that lose the race will just remove the duplicates. > > given we have 1 month of time between the creation and the actual time > > keys will be used we have all the time to let servers sort out whether > > there are keys available or not and prune out duplicates. > > > > A diagram in case I have not been clear enough > > > > > > Assume servers A, B, C they all randomize (within a week) the time at > > which they will attempt to create new keys if it is time to and none are > > available already. > > > > Say the time come to create G2, A, B ,C each throw a dice and it turns > > out A will do it in 35000 seconds, B will do it in 40000 seconds, and C > > in 32000 seconds, so C should do it first and there should be enough > > time for the others to see that new keys popped up and just discard > > their attempts. > > > > However is A or C are temporarily disconnected they may still end up > > generating new keys, so we have G2-A and G2-B, once they get reconnected > > and replication flows again all servers see that instead of a single G2 > > set we have 2 G2 sets available > > G2-A created at timestamp X+35000 and G2-B created at timestamp X+32000, > > so all servers know they should ignore G2-A, and they all ignore it. > > When A comes around to realize this itself it will just go and delete > > the G2-A set. Only G2-B set is left and that is what will be the final > > official G2. > > > > If we give a week of time for this operation to go on I think it will be > > easy to resolve any race or temporary diconnection that may happen. > > Also because all server can attempt (within that week) to create keys > > there is no real single point of failure. > > > > HTH, > > please poke holes in my reasoning :) > > > > Simo. > > > > Reasonable just have couple comments. > If there are many keys and many replicas the chance would be that there > will be a lot of load. Generating keys is costly computation wise. > Replication is costly too. > Also you assume that topology works fine. I am mostly concerned about > the case when some replication is not working and data from one part of > the topology is not replicated to another. The concern is that people > would not notice that things are not replicating. So if there is a > problem and we let all these key to be generated all over the place it > would be pretty hard to untie this knot later.
If replication is broken for so long you have *much* more serious issues. > I would actually suggest that if a replica X needs the keys in a month > from moment A and the keys have not arrived in 3 first days after moment > A and this replica is not entitled to generate keys it should start > sending messages to admin. That way there will be enough time for admin > to sort out what is wrong and nominate another replica to generate the > keys if needed. There should be command as simple as: If we are going to spam admins I think we should do that once a replica sees that replication is broken, and not just wait until some key material is late to arrive. I guess we needed to start thinking of allowing to configure an MTA address in one LDAP attribute and an administrative email address and then send messages when something critical happens. I think admins in general will be happy unless we overload them with garbage or repeated messages. > ipa dnssec-keymanager-set <replica> > > that would make the mentioned replica the key generator. I do not think there should be any special replica enable, my scheme makes every replica participate in key creation (at least every replica that has a DNS server on it). > There can be other commands like > > ipa dnssec-keymanager-info > > Appointed server: <server> there is no appointed server, it would be a SPOF which is what I want to avoid with my scheme. > Keys store: <path> > Last time keys generated: <some time> > Next time keys need to be generated: <...> > ... > > > > > IMO in this case we need to help admin to see that there is a problem > and provide tools to easily mitigate it rather than try to solve it > ourselves and build a complex algorythm. Yes, but this is a general problem, I think it is time to build some small part of A where at least we handle self-health checks and then spam admins if something really bad is happening. Simo. -- Simo Sorce * Red Hat, Inc * New York _______________________________________________ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel