As an update - just in case somebody comes across this thread in the future
I copied the environment to a test rig and performed the surgery as
proposed. And it worked. I was able to promote a new replica.
For those really interested in the details, here's the series of steps I
performed - some steps are slightly edited, so the working directory for
each step might not be quite right, but it's not far off.
tar -zxvf ~/ds2-alias.tgz
certutil -N -d new
cp -a /etc/pki/pki-tomcat/alias/pwdfile.txt new
for i in 'caSigningCert cert-pki-ca' 'Server-Cert cert-pki-ca'
'ocspSigningCert cert-pki-ca' 'subsystemCert cert-pki-ca' ; do echo "$i" ;
pk12util -o "$i" -d /etc/pki/pki-tomcat/alias/ -n "$i" -k
/etc/pki/pki-tomcat/alias/pwdfile.txt ; done
pk12util -d ../ds2/etc/pki/pki-tomcat/alias/ -n 'auditSigningCert
cert-pki-ca' -k ../ds2/etc/pki/pki-tomcat/alias/pwdfile.txt -w
/etc/pki/pki-tomcat/alias/pwdfile.txt -o 'auditSigningCert cert-pki-ca'
for i in * ; do pk12util -i "$i" -w ../new/pwdfile.txt -k
../new/pwdfile.txt -d ../new/ -n "$i" ; done
certutil -L -d .
certutil -L -d /etc/pki/pki-tomcat/alias/
certutil -d . -n 'caSigningCert cert-pki-ca' -M -t 'CTu,Cu,Cu'
certutil -d . -n 'auditSigningCert cert-pki-ca' -M -t 'u,u,uP'
certutil -L -d .
systemctl stop certmonger
mv alias alias-old
mv ~/work/new alias
chown -R pkiuser:pkiuser alias
restorecon -R alias
chcon -R -u system_u alias
systemctl start certmonger
I'll perform the same-ish series of steps in production in a maintenance
window in the not too distant future.
I'm still wondering how this might have happened, whether some cosmic event
has corrupted the NSSDB, or ... /shrug
Anyway, I think it's basically fixed.
On 10 July 2018 at 21:54, Andy Stubbs <andy.stu...@treatwell.com> wrote:
> So, I have what I think seems to be a slightly odd problem. And I think
> I've worked out what the solution might be - but not the root cause. In any
> case, I wanted to run it by you all and see whether you agree or have any
> insight into it.
> The background
> running 6 directory servers 4.5.0-21 on CentOS 7.4.1708, 3 of which have
> the CA role. I've been running the directory blissfully uneventfully for
> 7ish months now. We have experimented a little bit with the CA features,
> but nothing that can't be done trivially with the web interface (on
> reflection I'm sure it probably is trivial to revoke your primary
> certificate authority with the web interface, but you know what I mean).
> The problem
> In the past few days I've had the occasion to try to create a new replica
> but on each attempt, the process fails around this time:
> [4/4]: configuring ipa-custodia to start on boot
> Done configuring ipa-custodia.
> The ipa-replica-install command failed, exception: HTTPError: 404 Client
> Error: Not Found
> 404 Client Error: Not Found
> The ipa-replica-install command failed. See /var/log/ipareplica-install.log
> for more information
> Now, I've learned a fair amount over the past few days digging into this,
> like what ipa-custodia is, and how to poke it.
> It seems that at this point, the process is still actually actively doing
> things - it appears to be generating some kind of NSS certificate/key
> store. And that process is failing, because apparently it can't find the
> key for the entry "auditSigningCert cert-pki-ca" - specifically in
> custodiainstance.__get_keys the call to cli.fetch_key is failing for this
> nickname (but no others).
> So, more digging, and I find that yes indeed, the private key appears to
> be missing from the cert database on one of the directory servers
> (specifically the "first" directory server).
> I haven't quite joined the dots on how custodia is working here, but using
> the following command:
> sudo certutil -L -d /etc/pki/pki-tomcat/alias
> I can determine that on the first directory server, the trust attributes
> for this cert are ",,P" whereas on the other two CA directory servers, the
> trust attributes are "u,u,uP", and that indeed the key is missing from the
> first directory server in this database.
> I also note that the cert databases seem to be divergent in other ways
> between the CA servers. Which I find interesting.
> But anyway, so my next action is to copy the cert databases to another
> machine and to try to import the cert/key from a "good" CA db to the "bad"
> CA db using pk12util.
> This gives me a segmentation fault.
> So, I try with a new DB. I export all the cert/key pairs from the "bad" CA
> individually and import them into a new DB, replicating the trust
> attributes. So far so good. I also export the missing cert/key from a
> "good" CA and import that into the same new DB. Also apparently good.
> The solution?
> So, at this point, I feel relatively confident that I have constructed a
> good DB and I should be able to perform some surgery to remove the old
> "bad" DB and replace it with this "good" DB.
> My questions are:
> 1. Does this approach seem reasonable or am I oversimplifying?
> 2. If this is a reasonable approach: what's my best method for performing
> the surgery? ipactl stop, move bad db directory out of way, move "good" db
> in, don't forget the selinux stuff, then ipactl start again?
> 3. How could this even happen in the first place? Is it a known issue?
> 4. Shouldn't the CA databases basically all look the same between servers
> created at the same time? Why might they diverge?
> 5. Do you have any other comments or questions which you feel might be
> Thanks in advance for any input or insights shared.
> Best Regards
> Andrew Stubbs, PhD
> Head of Technical Operations
Andrew Stubbs, PhD
Head of Technical Operations
+44 203 770 4582
+44 7711 002930
FreeIPA-users mailing list -- firstname.lastname@example.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines