On 09/04/2013 10:17 AM, Petr Spacek wrote: > On 4.9.2013 15:50, Alexander Bokovoy wrote: >> On Wed, 04 Sep 2013, Dmitri Pal wrote: >>> On 09/04/2013 09:08 AM, Dmitri Pal wrote: >>>> On 09/03/2013 04:01 PM, Simo Sorce wrote: >>>>> On Tue, 2013-09-03 at 12:36 -0400, Dmitri Pal wrote: >>>>>> On 09/02/2013 09:42 AM, Petr Spacek wrote: >>>>>>> On 27.8.2013 23:08, Dmitri Pal wrote: >>>>>>>> On 08/27/2013 03:05 PM, Rob Crittenden wrote: >>>>>>>>> Dmitri Pal wrote: >>>>>>>>>> On 08/09/2013 08:30 AM, Petr Spacek wrote: >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I would like to get opinions about key maintenance for DNSSEC. >>>>>>>>>>> >>>>>>>>>>> Problem summary: >>>>>>>>>>> - FreeIPA will support DNSSEC >>>>>>>>>>> - DNSSEC deployment requires <2,n> cryptographic keys for >>>>>>>>>>> each DNS >>>>>>>>>>> zone (i.e. objects in LDAP) >>>>>>>>>>> - The same keys are shared by all FreeIPA servers >>>>>>>>>>> - Keys have limited lifetime and have to be re-generated on >>>>>>>>>>> monthly >>>>>>>>>>> basics (in very first approximation, it will be configurable >>>>>>>>>>> and the >>>>>>>>>>> interval will differ for different key types) >>>>>>>>>>> - The plan is to store keys in LDAP and let 'something' (i.e. >>>>>>>>>>> certmonger or oddjob?) to generate and store the new keys >>>>>>>>>>> back into >>>>>>>>>>> LDAP >>>>>>>>>>> - There are command line tools for key-generation >>>>>>>>>>> (dnssec-keygen from >>>>>>>>>>> the package bind-utils) >>>>>>>>>>> - We plan to select one super-master which will handle regular >>>>>>>>>>> key-regeneration (i.e. do the same as we do for special CA >>>>>>>>>>> certificates) >>>>>>>>>>> - Keys stored in LDAP will be encrypted somehow, most >>>>>>>>>>> probably by >>>>>>>>>>> some >>>>>>>>>>> symmetric key shared among all IPA DNS servers >>>>>>>>>>> >>>>>>>>>>> Could certmonger or oddjob do key maintenance for us? I can >>>>>>>>>>> imagine >>>>>>>>>>> something like this: >>>>>>>>>>> - watch some attributes in LDAP and wait until some key expires >>>>>>>>>>> - run dnssec-keygen utility >>>>>>>>>>> - read resulting keys and encrypt them with given 'master key' >>>>>>>>>>> - store resulting blobs in LDAP >>>>>>>>>>> - wait until another key reaches expiration timestamp >>>>>>>>>>> >>>>>>>>>>> It is simplified, because there will be multiple keys with >>>>>>>>>>> different >>>>>>>>>>> lifetimes, but the idea is the same. All the gory details >>>>>>>>>>> are in the >>>>>>>>>>> thread '[Freeipa-devel] DNSSEC support design >>>>>>>>>>> considerations: key >>>>>>>>>>> material handling': >>>>>>>>>>> https://www.redhat.com/archives/freeipa-devel/2013-July/msg00129.html >>>>>>>>>>> >>>>>>>>>>> https://www.redhat.com/archives/freeipa-devel/2013-August/msg00086.html >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Nalin and others, what do you think? Is certmonger or oddjob >>>>>>>>>>> the >>>>>>>>>>> right >>>>>>>>>>> place to do something like this? >>>>>>>>>>> >>>>>>>>>>> Thank you for your time! >>>>>>>>>>> >>>>>>>>>> Was there any discussion of this mail? >>>>>>>>>> >>>>>>>>> I think at least some of this was covered in another thread, >>>>>>>>> "DNSSEC >>>>>>>>> support design considerations: key material handling" at >>>>>>>>> https://www.redhat.com/archives/freeipa-devel/2013-August/msg00086.html >>>>>>>>> >>>>>>>>> >>>>>>>>> rob >>>>>>>>> >>>>>>>>> >>>>>>>> Yes, I have found that thread though I have not found it to >>>>>>>> come to some >>>>>>>> conclusion and a firm plan. >>>>>>>> I will leave to Petr to summarize outstanding issues and repost >>>>>>>> them. >>>>>>> All questions stated in the first e-mail in this thread are >>>>>>> still open: >>>>>>> https://www.redhat.com/archives/freeipa-devel/2013-August/msg00089.html >>>>>>> >>>>>>> >>>>>>> There was no reply to these questions during my vacation, so I >>>>>>> don't >>>>>>> have much to add at the moment. >>>>>>> >>>>>>> Nalin, please, could you provide your opinion? >>>>>>> How modular/extendible the certmonger is? >>>>>>> Does it make sense to add DNSSEC key-management to certmonger? >>>>>>> What about CA rotation problem? Can we share some algorithms >>>>>>> (e.g. for >>>>>>> super-master election) between CA rotation and DNSSEC key rotation >>>>>>> mechanisms? >>>>>>> >>>>>>>> BTW I like the idea of masters being responsible for generating >>>>>>>> a subset >>>>>>>> of the keys as Loris suggested. >>>>>>> E-mail from Loris in archives: >>>>>>> https://www.redhat.com/archives/freeipa-devel/2013-August/msg00100.html >>>>>>> >>>>>>> >>>>>>> The idea seems really nice and simple, but I'm afraid that there >>>>>>> could >>>>>>> be some serious race conditions. >>>>>>> >>>>>>> - How will it work when topology changes? >>>>>>> - What if number of masters is > number of days in month? (=> >>>>>>> Auto-tune interval from month to smaller time period => Again, what >>>>>>> should we do after a topology change?) >>>>>>> - What we should do if topology was changed when a master was >>>>>>> disconnected from the rest of the network? (I.e. Link over WAN was >>>>>>> down at the moment of change.) What will happen after >>>>>>> re-connection to >>>>>>> the topology? >>>>>>> >>>>>>> Example: >>>>>>> Time 0: Masters A, B; topology: A---B >>>>>>> Time 1: Master A have lost connection to master B >>>>>>> Time 2: Master C was added; topology: A ยง B---C >>>>>>> Time 3 (Day 3): A + C did rotation at the same time >>>>>>> Time 4: Connection was restored; topology: A---B---C >>>>>>> >>>>>>> Now what? >>>>>>> >>>>>>> >>>>>>> I have a feeling that we need something like quorum protocol for >>>>>>> writes (only for sensitive operations like CA cert and DNSSEC key >>>>>>> rotations). >>>>>>> >>>>>>> http://en.wikipedia.org/wiki/Quorum_(distributed_computing) >>>>>>> >>>>>>> >>>>>>> The other question is how should we handle catastrophic situations >>>>>>> where more than half of masters were lost? (Two of three data >>>>>>> centres >>>>>>> were blown by a tornado etc.) >>>>>>> >>>>>> It becomes more and more obvious that there is no simple solution >>>>>> that >>>>>> we can use out of box. >>>>>> Let us start with a single nominated server. If the server is >>>>>> lost the >>>>>> key rotation responsibility can be moved to some other server >>>>>> manually. >>>>>> Not optimal but at least the first step. >>>>>> >>>>>> The next step would be to be able to define alternative (failover) >>>>>> servers. Here is an example. >>>>>> Let us say we have masters A, B, C. In topology A - B - C. >>>>>> Master A is responsible for the key rotation B is the fail-over. >>>>>> The key rotation time would be in some way recorded in the >>>>>> replication >>>>>> agreement(s) between A & B. >>>>>> If at the moment of the scheduled rotation A <-> B connection is not >>>>>> present A would skip rotation and B would start rotation. If A comes >>>>>> back and connects to B (or connection is just restored) the >>>>>> replication >>>>>> will update the keys on A. If A is lost the keys are taken care >>>>>> of by B >>>>>> for itself and C. >>>>>> There will be a short window of race condition but IMO it can be >>>>>> mitigated. If A clock is behind B then if A managed to connect to >>>>>> B it >>>>>> would notice that B already started rotation. If B clock is >>>>>> behind and A >>>>>> connects to B before B started rotation A has to perform rotation >>>>>> still >>>>>> (sort of just made it case). >>>>>> >>>>>> Later if we want more complexity we can define subsets of the >>>>>> keys to >>>>>> renew and assign them to different replicas and then define failover >>>>>> servers per set. >>>>>> But this is all complexity we can add later when we see the real >>>>>> problems with the single server approach. >>>>> Actually I thought about this for a while, and I think I have an idea >>>>> about how to handle this for DNSSEC, (may not apply to other cases >>>>> like >>>>> CA). >>>>> >>>>> IIRC keys are generate well in advance from the time they are used >>>>> and >>>>> old keys and new keys are used side by side for a while, until old >>>>> keys >>>>> are finally expired and only new keys are around. >>>>> >>>>> This iso regulated by a series of date attributes that determine when >>>>> keys are in used when they expire and so on. >>>>> >>>>> Now the idea I have is to add yet another step. >>>>> >>>>> Assume we have key "generation 1" (G1) in use and we approach the >>>>> time >>>>> generation 1 will expire and generation 2 (G2) is needed, and G2 is >>>>> created X months in advance and all stuff is signed with both G1 >>>>> and G2 >>>>> for a period. >>>>> >>>>> Now if we have a pre-G2 period we can have a period of time when >>>>> we can >>>>> let multiple servers try to generate the G2 series, say 1 month in >>>>> advance of the time they would normally be used to start signing >>>>> anything. Then only after that 1 month they are actually put into >>>>> services. >>>>> >>>>> How does this helps? Well it helps in that even if multiple servers >>>>> generate keys and we have duplicates they have all the time to see >>>>> that >>>>> there are duplicates (because 2 server raced). >>>>> now if e can keep a subsecond 'creation' timestamp for the new >>>>> keys when >>>>> replication goes around all servers can check and use only the set of >>>>> keys that have been create first, and the servers that created the >>>>> set >>>>> of keys that lose the race will just remove the duplicates. >>>>> given we have 1 month of time between the creation and the actual >>>>> time >>>>> keys will be used we have all the time to let servers sort out >>>>> whether >>>>> there are keys available or not and prune out duplicates. >>>>> >>>>> A diagram in case I have not been clear enough >>>>> >>>>> >>>>> Assume servers A, B, C they all randomize (within a week) the time at >>>>> which they will attempt to create new keys if it is time to and >>>>> none are >>>>> available already. >>>>> >>>>> Say the time come to create G2, A, B ,C each throw a dice and it >>>>> turns >>>>> out A will do it in 35000 seconds, B will do it in 40000 seconds, >>>>> and C >>>>> in 32000 seconds, so C should do it first and there should be enough >>>>> time for the others to see that new keys popped up and just discard >>>>> their attempts. >>>>> >>>>> However is A or C are temporarily disconnected they may still end up >>>>> generating new keys, so we have G2-A and G2-B, once they get >>>>> reconnected >>>>> and replication flows again all servers see that instead of a >>>>> single G2 >>>>> set we have 2 G2 sets available >>>>> G2-A created at timestamp X+35000 and G2-B created at timestamp >>>>> X+32000, >>>>> so all servers know they should ignore G2-A, and they all ignore it. >>>>> When A comes around to realize this itself it will just go and delete >>>>> the G2-A set. Only G2-B set is left and that is what will be the >>>>> final >>>>> official G2. >>>>> >>>>> If we give a week of time for this operation to go on I think it >>>>> will be >>>>> easy to resolve any race or temporary diconnection that may happen. >>>>> Also because all server can attempt (within that week) to create keys >>>>> there is no real single point of failure. >>>>> >>>>> HTH, >>>>> please poke holes in my reasoning :) >>>>> >>>>> Simo. >>>>> >>>> Reasonable just have couple comments. >>>> If there are many keys and many replicas the chance would be that >>>> there >>>> will be a lot of load. Generating keys is costly computation wise. >>>> Replication is costly too. >>>> Also you assume that topology works fine. I am mostly concerned about >>>> the case when some replication is not working and data from one >>>> part of >>>> the topology is not replicated to another. The concern is that people >>>> would not notice that things are not replicating. So if there is a >>>> problem and we let all these key to be generated all over the place it >>>> would be pretty hard to untie this knot later. >>>> >>>> I would actually suggest that if a replica X needs the keys in a month >>>> from moment A and the keys have not arrived in 3 first days after >>>> moment >>>> A and this replica is not entitled to generate keys it should start >>>> sending messages to admin. That way there will be enough time for >>>> admin >>>> to sort out what is wrong and nominate another replica to generate the >>>> keys if needed. There should be command as simple as: >>>> >>>> ipa dnssec-keymanager-set <replica> >>>> >>>> that would make the mentioned replica the key generator. >>>> There can be other commands like >>>> >>>> ipa dnssec-keymanager-info >>>> >>>> Appointed server: <server> >>>> Keys store: <path> >>>> Last time keys generated: <some time> >>>> Next time keys need to be generated: <...> >>>> ... >>>> >>>> >>>> >>>> >>>> IMO in this case we need to help admin to see that there is a problem >>>> and provide tools to easily mitigate it rather than try to solve it >>>> ourselves and build a complex algorythm. >>>> >>> Thinking even more about this. >>> May be we should start with the command that would be something like: >>> >>> ipa health >>> >>> This command would detect the topology, try to connect to all replicas >>> check that they are all up and running, replicating, nothing is stuck >>> and report any issues. >>> The output of the command can be sent somewhere or as a mail to admin. >>> >>> Then it can be run periodically as a part of cron on couple servers and >>> if there is any problem admin would know quite soon. >>> Then admin would know things like: >>> 1) The CRL generating server is down/unreachable >>> 2) The DNSSEC key generating server is down/unreachable >>> 3) Some CAs are unreachable >>> 4) The server that rotates certificates is down/unreachable >>> 5) The server that does AD sync is down/unreachable >>> >>> There might be other things. >>> IMO we have enough sinlge point of failure services already. Adding >>> DNSSEC key generation to that set is not a big deal but the utility >>> like >>> this would really go a long way making IPA more usable, manageable and >>> useful. >>> >>> Should I file an RFE? >> The tool you describe above should be able to perform operations on >> the master. >> it is in general better not to put master-specific operations into a >> client tool that could be run from an arbitrary host where ipa admin >> tools are installed. >> >> What about plugging the functionality into ipa-advise? >> >> ipa-advise health-check-{cert|replication|dnssec|...} > > I agree with health check idea and also with the modular approach > proposed by Alexander.
I assume you mean "a" master rather than "the" master :-) Running this command on any master would be fine. If it makes sense as an ipa-advise option I am fine with it too. If others agree then let us open a ticket to add this functionality to ipa-advise > > Side note: I think that the tool should have an option to enable > machine parse-able output, because it would allow to third parties to > connect it to monitoring systems like Zabbix etc. Yes. Agree. > > Today I spent some time with analysis of Simo's proposal and I wasn't > able to find hole up to now. It seems as good idea and added code > complexity should be relatively small. For that reason I vote for > implementing it before we declare DNSSEC 'stable'. Should we treat this functionality independent from the tool? I am concerned with volume of the load and replication. I think it should be an option - single master generates keys or you can enable others to generate the keys and if they are enabled to generate the keys they would follow the algorithm proposed by Simo. > > Don't forget that whole infrastructure will break if DNSSEC keys are > not updated in time and that the rotation happens several times each > month. > True but it is better if it is clear why it breaks and easy to fix and does not require complex procedure to get back online. -- Thank you, Dmitri Pal Sr. Engineering Manager for IdM portfolio Red Hat Inc. ------------------------------- Looking to carve out IT costs? www.redhat.com/carveoutcosts/ _______________________________________________ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel