siology.io wrote:
Hello there.

My setup is that i have five ipa servers. 2 in one location (alder,
auth-syd2), 2 in anouther location (auth-wlg, auth-wlg2), and one in yet
anouther location (waffle) which is reached over a long,
mostly-but-possibly-notably-not-entirely reliable vpn connection.

I'm having an issue with an IPA server falling over. By 'falling over'
what i mean is that it no longer responds to ldap queries (although the
tcp port 389 is still open via nmap). When i run 'systemctl ipa stop'
the command never seems to return, so up to now the only fix i have it
to reboot that server.

If 389-ds is hanging you'll want to follow these instructions to find out why, http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#debugging-hangs

rob


All machines are centos 7. All are using
ipa-server-4.2.0-15.0.1.el7.centos.18.x86_64. Replication occurs
between: alder<->auth-wlg, alder<->syd2, auth-wlg<->auth-wlg2, and
auth-wlg<->waffle, possibly notably *not* between alder and waffle directly.

The problem of ldap being unavailable occurs on alder only; the other
ipa servers seem to be reliable. Unfortunately, alder is also our most
used server.

The error logs off alder look like this: http://pastebin.com/TxCVjWTe
with reboot done at around 19:55

I did notice upon investigating / googling the errors in this log -
starting with the attr_replace (nsslapd-referral) one, that on my
servers this ldap query:

ldapsearch -ZZ -h alder.blah.com <http://alder.blah.com> -D
"cn=Directory Manager" -W -b "o=ipaca"
"(&(objectclass=nstombstone)(nsUniqueId=ffffffff-ffffffff-ffffffff-ffffffff))"
  | grep "nsds50ruv\|nsDS5ReplicaId"

returns results similar to this:

nsDS5ReplicaId: 96
nsds50ruv: {replicageneration} 5733d428000000600000
nsds50ruv: {replica 96 ldap://alder.blah.com:389
<http://alder.blah.com:389>} 5733d474000000600000 57
nsds50ruv: {replica 91 ldap://auth-syd2.blah.com:389
<http://auth-syd2.blah.com:389>} 576337b90004005b000
nsds50ruv: {replica 97 ldap://auth-wlg.blah.com:389
<http://auth-wlg.blah.com:389>} 5733d49a000000610000
nsds50ruv: {replica 1095 ldap://auth-wlg2.blah.com:389
<http://auth-wlg2.blah.com:389>} 574fa5b0000004470
nsds50ruv: {replica 1090 ldap://waffle.bsh.blah.com:389
<http://waffle.bsh.blah.com:389>} 576b1add00000442
nsds50ruv: {replica 1085 ldap://waffle.bsh.blah.com:389
<http://waffle.bsh.blah.com:389>} 576b22f10000043d

i.e: waffle is listed twice. If i run that ldap query on waffle though,
i get no results at all (but the command does at least return). - so i
dont know waffle's nsDS5ReplicaId at the moment. I understand once i
know that i can clean-ruv the other id off the other ipa servers? I
don't *think* any of this is related to my original issue above though,
but it might be a smoking gun, i don't know - just mentioning it in case.

At the moment i've not got a lot to go on. Has anyone else seen errors
like those in the paste bin, or might know where to look for more useful
info ? Possibly also worth noting that alder, and auth-syd2 are AWS ec2
instances. The rest are vm's on site(s).




--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to