Rob Foehl wrote: > On Mon, 19 Jun 2017, Rob Crittenden wrote: > >> Rob Foehl wrote: >>> On Thu, 15 Jun 2017, Rob Crittenden wrote: >>> >>>> Rob Foehl wrote: >>>>> Can I at least get a yes or no on whether external CA certificate >>>>> renewal has ever been tested when that certificate is nearing >>>>> expiration? >>>> >>>> Yes. I tested this with IPA v3.0. Did it break in between? Possible. >>>> >>>> As I pointed out certmonger is unaware of the certificate chain and >>>> focuses only on the cert not-after date and resubmits the CSR to the CA >>>> that issued the certificate originally. >>> >>> Thanks for the reply. >>> >>> certmonger not knowing about the chain is understandable, as is the >>> resubmission of each tracked cert to the existing CA. Doing this >>> results in a pile of certs that expire relatively quickly, being tied to >>> the old CA, but that's also not surprising -- the surprise is that it >>> only did that once, and has since appeared to ignore them all, even >>> after the CA was renewed manually and the newly-issued-but-short-lived >>> certs tied to the old CA expired. >> >> Ok, I'll need to try to reproduce it. It may take me a while to get >> around to this so feel free to nag me. > > Consider this that, maybe... I just got around to beating my head > against this some more myself. I'm still trying to convince myself that > use of an external CA is viable, so I'd resurrected the test VM from > May/June and this time actually managed to sort it out. More detail below. > >>>>> I just duplicated last week's result using an earlier snapshot of the >>>>> same VM and a renewed CA cert with a 3-day validity. certmonger >>>>> ignored >>>>> every other cert that it already renewed once with the original CA; >>>>> whole system is hosed after the original cert expires. It's probably >>>>> possible to recover by manually replacing every certificate, but I >>>>> haven't had time to try that. >>>> >>>> certmonger checks at days 28, 7, 3, 2 and 1 before expiration by >>>> default >>>> for certificate expiration so it should have looked at the certs at >>>> least two times, three depending on timing (and really, it's seconds >>>> before expiration). Did you let the system sit for 3 days before things >>>> died? Was anything logged to syslog? Moving time forward a day at a >>>> time >>>> is insufficient to test this without restarting certmonger. >>> >>> I let the original VM snapshot run for a month straight, renewing the >>> IPA CA by hand after the first round of certmonger-initiated renewals >>> with 14 days til expiration and on the second attempt after expiration. >>> The first attempt used another 30-day cert, the second used a 3-day and >>> was allowed to run straight through. No time jumps while the VM is >>> running, and all snapshots with the VM powered off, so it always booted >>> with an accurate clock. >>> >>> certmonger never logged anything after the first renewal cycle on either >>> attempt. A 'getcert list' on the long-running VM shows all of the >>> tracked certificates with an expiration date of 2017-06-24, which >>> matches the lifetime of the renewed CA cert, but none of the services >>> attempting to load or use them are happy. >> >> It depends on why they aren't happy. Are they not happy due to expired >> certs or something else? > > They weren't happy due to the expired CA certs, and in some cases the > leaf certificates hadn't been updated in place due to SELinux denials.
We recently started seeing this as well, https://bugzilla.redhat.com/show_bug.cgi?id=1481388 This is a frustrating issue as the certificate gets issued but the places that need to be updated to reflect it aren't which causes a cascade of failures. > > I'm still not sure why certmonger thought it'd replaced certificates > when it hadn't, and I don't remember which of the last ~30 snapshots > left them in this state, or I'd dig deeper :) Because certmonger doesn't track the pre or post scripts. From certmonger's perspective (at least in the BZ above) the certificate was successfully renewed but because of SELinux issues parts of the post-command script blew up which leaves things in an unhappy state in general. > >>> But httpd still refuses to start with that NSSDB, and this appears to be >>> why: >>> >>> # certutil -L -n Signing-Cert -d /etc/httpd/alias >>> Certificate: >>> Data: >>> Version: 3 (0x2) >>> Serial Number: 9 (0x9) >>> Signature Algorithm: PKCS #1 SHA-256 With RSA Encryption >>> Issuer: "CN=Certificate Authority,O=EXAMPLE.COM" >>> Validity: >>> Not Before: Mon May 08 06:33:16 2017 >>> Not After : Wed Jun 07 06:25:53 2017 >>> Subject: "CN=Object Signing Cert,O=EXAMPLE.COM" >> >> mod_nss shouldn't be considering the signing cert so I doubt this is >> related. > > The startup failures may have been related to the WSGI modules trying to > connect to other services with expired certs. Either way, that expired > CA cert is what gets presented to HTTPS clients, so it's still a problem. > >>> Does certmonger know how to replace the entire certificate chain in the >>> respective store(s)? >>> >>> (The third certificate in there, ipaCert / CN=IPA RA, has the same dates >>> as the Server-Cert above.) >> >> So it was renewed as well. >> >> certmonger doesn't push out new chains so if that changed in between >> that would do it. This is another way to test cert validation from the >> command-line: >> >> # certutil -V -u V -d /etc/httpd/alias -n Server-Cert >> >> If you want to see if updating the CA cert(s) makes any difference. >> >>> >>> >>>> Even in a worst-case scenario, where all the certs expire, it is a >>>> fairly straightforward process to get the services back up by going >>>> back >>>> in time, renewing the IPA CA then restarting certmonger to renew the >>>> service certificates. >>>> >>>> Is it perfect? No. A search of the users forum should make that >>>> apparent. It has been difficult to reproduce the failures because it's >>>> difficult to simulate by moving time around. Several years ago I left >>>> VMs running for months to try to simulate failures and it always worked >>>> for me. >>> >>> I haven't tried kicking the clock around yet... The second attempt >>> booted from a month-old snapshot and immediately blew itself up; >>> renewing the CA cert and restarting certmonger (really, the whole VM) >>> didn't change anything. >> >> If the chain changes then yeah, that'd cause problems. > > I think I've stumbled onto what happened here, but I don't know how to > reliably reproduce it. See below. > >>>> Note too that there is a difference between certmonger and the >>>> renewals. >>>> certmonger renews certs but there are helpers that need to fire off to >>>> update information within IPA as well and to distribute updated >>>> certificates to replicas. These scripts were updated significantly >>>> since >>>> I wrote them to be much more robust in terms of reliability and >>>> logging. >>> >>> Consider uses of "certmonger" above to include these... Another >>> wrinkle, discovered early on, was broken SELinux policy that prevented >>> certmonger from running any of them. That was (apparently) fixed by a >>> later selinux-policy-targeted package release, but I haven't tried the >>> whole process from a bare install since. The second test with the 3-day >>> lifetime on the IPA CA renewal should've been okay here. I can try >>> again with a fresh install and relatively short IPA CA cert lifetimes, >>> say 4 days per renewal if that'll be sufficient to provoke this a bit >>> faster. >>> >>> I'm still worried about the missing "phase 2" when it comes to >>> distributing a new external CA certificate -- the CA I have expires in 3 >>> years, and it'd be nice to know whether I'm shooting myself in the foot >>> if I try signing the for-real IPA CA with it now. >> >> The really tricky bit is distributing the updated CA chain around. I've >> been away from IPA for a while but I can give you some bread crumbs. I >> believe that ipa-cacert-manage can be used to update the stored CA chain >> in LDAP and then running ipa-certupdate will pull the chain down, it >> just needs to be run on every master and client. > > Bingo. The necessity of running ipa-certupdate in this case isn't > really covered anywhere in the documentation, with the best description > I could find in https://www.freeipa.org/page/V4/CA_certificate_renewal > starting with "there will be a new utility"... > > Here's what it took to coerce everything back into working order: > > - 'setenforce 0', followed by a shower attempting to wash away the shame > > Seriously, the lack of idempotent helper scripts is a huge problem here, > and is the underlying cause of most of this pain. certmonger can wind up > in a state where it thinks it's replaced certs when it hasn't; various > services (including Dogtag and the KDC proxy) can wind up unable to > connect to the directory service; et cetera. Yeah, I'm not sure what the best recourse is. Ideally there should be no need to re-run things manually. Practically it may still be required, but the current scripts aren't exactly meant to be run by end-users. The focus for at least the next release is tightening up loose ends exactly like this. As for the setenforce 0 once the SELinux issues are ironed out this should no longer be needed. The issue wasn't caught during pre-release testing. > > See https://bugzilla.redhat.com/show_bug.cgi?id=1475528 for the specific > instance still affecting pki-tomcatd. > > - Modify /etc/pki/pki-tomcat/ca/CS.cfg and > /etc/pki/pki-tomcat/password.conf to use plain LDAP connections, based > in part on information found in this post: > https://www.redhat.com/archives/freeipa-users/2017-January/msg00216.html > > This step was necessary to get pki-tomcatd to start at all, after its > client cert had been partially mangled by the earlier renewal attempt. > > - Stop certmonger, IPA, and chronyd or ntpd, as appropriate, and roll the > clock back to a date when the originally installed certs were valid > > - Really stop certmonger, violently, then remove /var/run/ipa/renewal.lock > > - Start IPA services via 'ipactl start', wait for everything to come up, > then start certmonger and wait for it to settle (which takes a while if > it's decided to attempt renewals with the old CA) > > - Run 'ipa-cacert-manage renew --external-ca' and sign the resulting CSR > with a validity interval that overlaps the original CA cert > > - Run 'ipa-cacert-manage renew --external-cert-file=/path/to/ipa-ca.pem > --external-cert-file=/path/to/ca.pem' to import the resulting CA chain > > - Stop certmonger again, clean up as above if necessary > > - Run 'ipa-certupdate', possibly after 'kinit admin' to get a ticket > > - Step clock forward to a day or two prior to original leaf certificate > expiration, as imposed by the original CA lifetime and within the > validity period of the new CA cert > > - Start certmonger, wait for it to renew all the leaf certificates, and > verify the results with 'getcert list', paying attention to the > expiration times across the board > > - Assuming this worked: stop all services again, revert the CS.cfg and > password.conf changes, and either manually fix the clock and restart > everything (including the time service) or just reboot > > > Here's the catch: this worked the first time I did it, with a new CA set > to expire 30 days after the last one and only stepping the clock forward > enough to land in the middle of that one. I repeated the whole process > (less the CS.cfg steps) with another externally signed CA cert for > another 30 days, and after that pass, certmonger refused to update > anything using the new CA, clinging instead to the one from the first > attempt and reusing its expiration date on all renewed certs. > > Why this happened isn't entirely clear, but one thing I did notice after > that attempt is that the newly replaced CA wasn't the first one listed > in (at least) the NSSDBs for httpd and pki-tomcatd, instead coming > second in the list when examined with certutil. I generated another CSR > and CA with different dates and a different offset from the second > attempt, and ran through the whole process again; the result was even > more bizarre, with all five CAs (the original, first renewal, and three > recent) now all appearing in the correct order in the NSSDBs, and > certmonger happily renewing the leaf certs, pinned to the new CA > expiration date. > > I'm not sure what to take away from that, other than that it worked > eventually, and I now have a functional IPA instance which I'd thought > was a lost cause the last time I looked at it. Happy to share anything > anyone wants a look at, including the NSSDBs which now look like this: > > # certutil -L -d /etc/pki/pki-tomcat/alias > > Certificate Nickname Trust > Attributes > > SSL,S/MIME,JAR/XPI > > OU=example.com CA,O=example.com,C=US CT,C,C > caSigningCert cert-pki-ca CTu,Cu,Cu > caSigningCert cert-pki-ca CTu,Cu,Cu > Server-Cert cert-pki-ca u,u,u > auditSigningCert cert-pki-ca u,u,Pu > ocspSigningCert cert-pki-ca u,u,u > caSigningCert cert-pki-ca CTu,Cu,Cu > caSigningCert cert-pki-ca CTu,Cu,Cu > caSigningCert cert-pki-ca CTu,Cu,Cu > subsystemCert cert-pki-ca u,u,u > > (Aside: is there any sane way to clean these up?) > > I'll keep this image around for a while, although I don't plan on > spending too much more time with it. Been enough "fun" already... NSS is supposed to pick the "best" cert to use when there is an overlap of subjects (best as in matches the usage, time is valid, etc). I don't know that the order in the output is meaningful. To clean up (after making one or several backups of the db files) would be to use certutil -L -a to export all the certs. IIRC it will dump them all into a single PEM file. Edit that file to pull out the one you want, use certutil -D to remove the cert(s) from the db, then certutil -A to add in the one from the PEM file you chose. Congratulations on most excellent troubleshooting! rob _______________________________________________ FreeIPA-users mailing list -- firstname.lastname@example.org To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org