Ludwig, thank you for the prompt, helpful reply. I've deleted the stale replication agreements, cleaned the dangling RUVs and renamed the huge file. It recreated the file but it's nowhere near as big as it was.
Now, on the second issue, it doesn't appear to be listening on port 636. The steps I'm following are, broadly: yum install ipa-server ipa-replica-install ./replica-info-id5.prod.mydomain.com.gpg I did not join the replica machine as a client before initiating the replication, I understand this is correct? Presumably the directory starts on the replica during the replica-install process? journalctl on the replica shows many of the following after I try to install: ERR - NSMMReplicationPlugin - replica_replace_ruv_tombstone - Failed to update replication update vector for replica dc=prod,dc=mydomain,dc=com: LDAP error - 1 This is the state of things after trying to install the replica: [root@id5 ~]# netstat -ltnp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1/systemd tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1139/sshd tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1332/master tcp6 0 0 :::111 :::* LISTEN 1/systemd tcp6 0 0 :::22 :::* LISTEN 1139/sshd tcp6 0 0 ::1:25 :::* LISTEN 1332/master tcp6 0 0 :::389 :::* LISTEN 1964/ns-slapd I note that port 389 is showing as tcp6 but I can see it with v4 from the master What I have noticed is that the master is very, very slow. In particular the httpd process running under the ipaapi user is sitting at 100% load most of the time. I suspect timeouts may be occurring if it's taking a long time for the master to respond to requests. Grateful for any more guidance Mike On 14 November 2017 at 12:23, Ludwig Krispenz via FreeIPA-users <freeipa-users@lists.fedorahosted.org> wrote: > > On 11/14/2017 11:40 AM, Mike Johnson via FreeIPA-users wrote: >> >> Hi >> >> I've got a small environment which had until recently 2 IPA servers. >> Both CentOS 7.4.1708 >> >> Version info: >> >> id1: >> Name : ipa-server >> Version : 4.5.0 >> Release : 21.el7.centos.2.2 >> Kernel: 3.10.0-693.5.2.el7.x86_64 >> 389-ds-base is at version 1.3.6.1 >> >> id5: >> Name : ipa-server >> Version : 4.5.0 >> Release : 21.el7.centos.2.2 >> Kernel: 3.10.0-693.5.2.el7.x86_64 >> 389-ds-base is at version 1.3.6.1 >> >> I recently had an issue with high IO/load, and noted that the following >> file: >> /var/lib/dirsrv/slapd-PROD-MYDOMAIN-COM/cldb/<long-filename>.db >> was huge (5GB-ish) in a very small 2-master environment. This is on >> the master. My understanding is that the entries in this file, which >> have timestamps from months ago, exist because of failed replication. >> I don't understand how to clear this without breaking things. > > looks like you have changelog trimming not enabled, if you enable trimming > now this would reduce the content, but not necessary reduce the file size, > but it would prevent it to grow. > If you stop the server and remove it, it will be recreated. What can happen > then is that required changes to update another replica are missing and repl > will ask you to reinit the other server. > > Now, the second problem should be unrelated. Looks like total init tries to > connect to port 636 and fails, the normal repl session fals because the init > didn't happen. Could you verify that id5 is listening on 636 or if you have > any errors in its error logs. >> >> >> Second issue; not sure if related: >> >> I've since lost the replica (id2) but I've prepared a new machine >> (id5) to be a new replica of id1. I've cleaned the RUVs and deleted >> the replication agreements but when I join the new machine to the >> existing one using `ipa-replica-install` then I get the following on >> the replica: >> >> ################ >> Starting replication, please wait until this has completed. >> Update in progress, 10 seconds elapsed >> [ldap://id1.prod.mydomain.com:389] reports: Update failed! Status: >> [-11 connection error: Unknown connection error (-11) - Total update >> aborted] >> >> [error] RuntimeError: Failed to start replication >> Your system may be partly configured. >> Run /usr/sbin/ipa-server-install --uninstall to clean up. >> >> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): >> ERROR Failed to start replication >> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): >> ERROR The ipa-replica-install command failed. See >> /var/log/ipareplica-install.log for more information >> [root@id5 ~]# ipa-replica-manage re-initialize --from >> id1.prod.mydomain.com >> Re-run /usr/sbin/ipa-replica-manage with --verbose option to get more >> information >> Unexpected error: cannot connect to 'ldaps://id5.prod.mydomain.com:636': >> ################ >> >> and the following on the master: >> >> ################ >> [14/Nov/2017:10:05:28.671905981 +0000] - INFO - NSMMReplicationPlugin >> - repl5_tot_run - Beginning total update of replica >> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)". >> [14/Nov/2017:10:05:38.031033860 +0000] - ERR - NSMMReplicationPlugin - >> repl5_tot_log_operation_failure - agmt="cn=meToid5.prod.mydomain.com" >> (id5:389): Received error -1 (Can't contact LDAP server): for total >> update operation >> [14/Nov/2017:10:05:38.032272148 +0000] - ERR - NSMMReplicationPlugin - >> release_replica - agmt="cn=meToid5.prod.mydomain.com" (id5:389): >> Unable to send endReplication extended operation (Can't contact LDAP >> server) >> [14/Nov/2017:10:05:38.095893236 +0000] - ERR - NSMMReplicationPlugin - >> repl5_tot_run - Total update failed for replica >> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)", error (-11) >> [14/Nov/2017:10:05:38.113388624 +0000] - INFO - NSMMReplicationPlugin >> - bind_and_check_pwp - agmt="cn=meToid5.prod.mydomain.com" (id5:389): >> Replication bind with GSSAPI auth resumed >> [14/Nov/2017:10:05:38.425682940 +0000] - WARN - NSMMReplicationPlugin >> - repl5_inc_run - agmt="cn=meToid5.prod.mydomain.com" (id5:389): The >> remote replica has a different database generation ID than the local >> database. You may have to reinitialize the remote replica, or the >> local replica. >> ################ >> >> I've checked the firewalls on both machines, and gone as far as to >> flush all the iptables rules to get it to work. No luck. >> >> I'm also getting hundreds of the last line "different database >> generation ID" but my understanding is that this is only logged >> because the replica is yet to be set up. >> >> Would anyone please be able to provide some guidance? I've been at >> this for a few days now! >> >> Thanks! >> MIke >> _______________________________________________ >> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org >> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org > > > -- > Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, > Commercial register: Amtsgericht Muenchen, HRB 153243, > Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, > Eric Shander > _______________________________________________ > FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org > To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org _______________________________________________ FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org