Re: [Freeipa-devel] [PATCH] 1079 address CA subsystem renewal issues

Rob Crittenden Tue, 29 Jan 2013 09:55:20 -0800

Petr Viktorin wrote:

On 01/15/2013 05:15 PM, Rob Crittenden wrote:

Petr Viktorin wrote:

On 01/15/2013 03:41 PM, Petr Viktorin wrote:

On 01/14/2013 10:56 PM, Rob Crittenden wrote:

Petr Viktorin wrote:

On 01/12/2013 12:49 AM, Rob Crittenden wrote:

Rob Crittenden wrote:

Petr Viktorin wrote:

On 01/07/2013 05:42 PM, Rob Crittenden wrote:

Petr Viktorin wrote:

On 01/07/2013 03:09 PM, Rob Crittenden wrote:

Petr Viktorin wrote:

[...]


Works for me, but I have some questions (this is an area I
know
little
about).

Can we be 100% sure these certs are always renewed
together? Is
certmonger the only possible mechanism to update them?


You raise a good point. If though some mechanism someone
replaces
one of
these certs it will cause the script to fail. Some
notification of
this
failure will be logged though, and of course, the certs
won't be
renewed.

One could conceivably manually renew one of these certificates.
It is
probably a very remote possibility but it is non-zero.

Can we be sure certmonger always does the updates in parallel?
If it
managed to update the audit cert before starting on the
others,
we'd
get
no CA restart for the others.


These all get issued at the same time so should expire at the
same
time
as well (see problem above). The script will hang around for 10
minutes
waiting for the renewal to complete, then give up.


The certs might take different amounts of time to update, right?
Eventually, the expirations could go out of sync enough for
it to
matter.
AFAICS, without proper locking we still get a race condition
when
the
other certs start being renewed some time (much less than 10
min)
after
the audit one:

(time axis goes down)

         audit cert                  other cert
         ----------                  ----------
     certmonger does renew                .
   post-renew script starts               .
  check state of other certs: OK          .
             .                   certmonger starts renew
  certutil modifies NSS DB  +  certmonger modifies NSS DB  ==
boom!


This can't happen because we count the # of expected certs and
wait
until all are in MONITORING before continuing.


The problem is that they're also in MONITORING before the whole
renewal
starts. If the script happens to check just before the state
changes
from MONITORING to GENERATING_CSR or whatever, we can get
corruption.

The worse that would
happen is the trust wouldn't be set on the audit cert and dogtag
wouldn't be restarted.

The state the system would be in is this:

- audit cert trust not updated, so next restart of CA will fail
- CA is not restarted so will not use updated certificates

And anyway, why does certmonger do renewals in parallel? It
seems
that
if it did one at a time, always waiting until the post-renew
script is
done, this patch wouldn't be necessary.


 From what Nalin told me certmonger has some coarse locking
such
that
renewals in a the same NSS database are serialized. As you
point
out, it
would be nice to extend this locking to the post renewal
scripts. We
can
ask Nalin about it. That would fix the potential corruption
issue.
It is
still much nicer to not have to restart dogtag 4 times.


Well, three extra restarts every few years seems like a small
price to
pay for robustness.


It is a bit of a problem though because the certs all renew
within
seconds so end up fighting over who is restarting dogtag. This
can
cause
some renewals go into a failure state to be retried later.
This is
fine
functionally but makes QE a bit of a pain. You then have to make
sure
that renewal is basically done, then restart certmonger and check
everything again, over and over until all the certs are renewed.
This is
difficult to automate.


So we need to extend the certmonger lock, and wait until Dogtag is
back
up before exiting the script. That way it'd still take longer
than 1
restart, but all the renews should succeed.


Right, but older dogtag versions don't have the handy servlet to
tell
that the service is actually up and responding. So it is
difficult to
tell from tomcat alone whether the CA is actually up and handling
requests.


Revised patch that takes advantage of new version of certmonger.
certmonger-0.65 adds locking from the time renewal begins to the
end of
the post_save_command. This lets us be sure that no other certmonger
renewals will have the NSS database open in read-write mode.

We need to be sure that tomcat is shut down before we let certmonger
save the certificate to the NSS database because dogtag opens its
database read/write and two writers can cause corruption.

rob


stop_pkicad and start_pkicad need the Dogtag version check to select
pki_cad/pki_tomcatd.


Fixed.


A more serious issue is that stop_pkicad needs to be installed on
upgrades. Currently the whole enable_certificate_renewal step in
ipa-upgradeconfig is skipped if it was done before.


I added a separate upgrade test for this. It currently won't work in
SELinux enforcing mode because certmonger isn't allowed to talk to
dbus
in an rpm post script. It's being looked at.

In stop_pkicad can you change the first log message to "certmonger
stopping %sd"? It's before the action so we don't want past tense.


Fixed.

rob


I get a bunch of errors when installing the RPM:

[...]


This is the SELinux issue you were talking about. Sorry for not catching
that.

With enforcing off, the patch looks & works well for me. I'm just
concerned about this change in ipa-upgradeconfig:

@@ -707,7 +754,7 @@ def main():
          # configuration has changed, restart the name server
          root_logger.info('Changes to named.conf have been made,
restart named')
          bindinstance.BindInstance(fstore).restart()
-    ca_restart = ca_restart or enable_certificate_renewal(ca) or
upgrade_ipa_profile(ca, api.env.domain, fqdn)
+    ca_restart = ca_restart or enable_certificate_renewal(ca) or
upgrade_ipa_profile(ca, api.env.domain, fqdn) or
certificate_renewal_stop_ca(ca)

If the enable_certificate_renewal step was done already, but
upgrade_ipa_profile requests a CA restart, then the short-circuiting
`or` will be satisfied and certificate_renewal_stop_ca won't be run.

Since each upgrade step has its own checking, I think it would be safer
to use something like:
     ca_restart = certificate_renewal_stop_ca(ca) or ca_restart

or even:
ca_restart = any([
     ca_restart,
     enable_certificate_renewal(ca),
     upgrade_ipa_profile(ca, api.env.domain, fqdn),
     certificate_renewal_stop_ca(ca),
])


I like this suggestion very much. Updated patch attached.

rob


ACK, just remove the trailing space in the `]) ` line.


We'll need to make sure the SELinux issue isn't forgotten.

I was waiting for a fixed selinux-policy package to get pushed and itfinally has.


Pushed patch to master, ipa-3-1 and ipa-3-0

rob

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Re: [Freeipa-devel] [PATCH] 1079 address CA subsystem renewal issues

Reply via email to