Pastebin for dirsrv/errors log file during/after failed join -- https://pastebin.com/gJR1SZWZ
On 14 November 2017 at 16:40, Mike Johnson <m.d.john...@kuub.org> wrote: > Ludwig, thank you for the prompt, helpful reply. > > I've deleted the stale replication agreements, cleaned the dangling > RUVs and renamed the huge file. It recreated the file but it's > nowhere near as big as it was. > > Now, on the second issue, it doesn't appear to be listening on port 636. > > The steps I'm following are, broadly: > > yum install ipa-server > ipa-replica-install ./replica-info-id5.prod.mydomain.com.gpg > > I did not join the replica machine as a client before initiating the > replication, I understand this is correct? > > Presumably the directory starts on the replica during the > replica-install process? > > journalctl on the replica shows many of the following after I try to install: > ERR - NSMMReplicationPlugin - replica_replace_ruv_tombstone - Failed > to update replication update vector for replica > dc=prod,dc=mydomain,dc=com: LDAP error - 1 > > This is the state of things after trying to install the replica: > [root@id5 ~]# netstat -ltnp > Active Internet connections (only servers) > Proto Recv-Q Send-Q Local Address Foreign Address > State PID/Program name > tcp 0 0 0.0.0.0:111 0.0.0.0:* > LISTEN 1/systemd > tcp 0 0 0.0.0.0:22 0.0.0.0:* > LISTEN 1139/sshd > tcp 0 0 127.0.0.1:25 0.0.0.0:* > LISTEN 1332/master > tcp6 0 0 :::111 :::* > LISTEN 1/systemd > tcp6 0 0 :::22 :::* > LISTEN 1139/sshd > tcp6 0 0 ::1:25 :::* > LISTEN 1332/master > tcp6 0 0 :::389 :::* > LISTEN 1964/ns-slapd > > I note that port 389 is showing as tcp6 but I can see it with v4 from the > master > > What I have noticed is that the master is very, very slow. In > particular the httpd process running under the ipaapi user is sitting > at 100% load most of the time. I suspect timeouts may be occurring if > it's taking a long time for the master to respond to requests. > > Grateful for any more guidance > Mike > > > > On 14 November 2017 at 12:23, Ludwig Krispenz via FreeIPA-users > <freeipa-users@lists.fedorahosted.org> wrote: >> >> On 11/14/2017 11:40 AM, Mike Johnson via FreeIPA-users wrote: >>> >>> Hi >>> >>> I've got a small environment which had until recently 2 IPA servers. >>> Both CentOS 7.4.1708 >>> >>> Version info: >>> >>> id1: >>> Name : ipa-server >>> Version : 4.5.0 >>> Release : 21.el7.centos.2.2 >>> Kernel: 3.10.0-693.5.2.el7.x86_64 >>> 389-ds-base is at version 1.3.6.1 >>> >>> id5: >>> Name : ipa-server >>> Version : 4.5.0 >>> Release : 21.el7.centos.2.2 >>> Kernel: 3.10.0-693.5.2.el7.x86_64 >>> 389-ds-base is at version 1.3.6.1 >>> >>> I recently had an issue with high IO/load, and noted that the following >>> file: >>> /var/lib/dirsrv/slapd-PROD-MYDOMAIN-COM/cldb/<long-filename>.db >>> was huge (5GB-ish) in a very small 2-master environment. This is on >>> the master. My understanding is that the entries in this file, which >>> have timestamps from months ago, exist because of failed replication. >>> I don't understand how to clear this without breaking things. >> >> looks like you have changelog trimming not enabled, if you enable trimming >> now this would reduce the content, but not necessary reduce the file size, >> but it would prevent it to grow. >> If you stop the server and remove it, it will be recreated. What can happen >> then is that required changes to update another replica are missing and repl >> will ask you to reinit the other server. >> >> Now, the second problem should be unrelated. Looks like total init tries to >> connect to port 636 and fails, the normal repl session fals because the init >> didn't happen. Could you verify that id5 is listening on 636 or if you have >> any errors in its error logs. >>> >>> >>> Second issue; not sure if related: >>> >>> I've since lost the replica (id2) but I've prepared a new machine >>> (id5) to be a new replica of id1. I've cleaned the RUVs and deleted >>> the replication agreements but when I join the new machine to the >>> existing one using `ipa-replica-install` then I get the following on >>> the replica: >>> >>> ################ >>> Starting replication, please wait until this has completed. >>> Update in progress, 10 seconds elapsed >>> [ldap://id1.prod.mydomain.com:389] reports: Update failed! Status: >>> [-11 connection error: Unknown connection error (-11) - Total update >>> aborted] >>> >>> [error] RuntimeError: Failed to start replication >>> Your system may be partly configured. >>> Run /usr/sbin/ipa-server-install --uninstall to clean up. >>> >>> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): >>> ERROR Failed to start replication >>> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): >>> ERROR The ipa-replica-install command failed. See >>> /var/log/ipareplica-install.log for more information >>> [root@id5 ~]# ipa-replica-manage re-initialize --from >>> id1.prod.mydomain.com >>> Re-run /usr/sbin/ipa-replica-manage with --verbose option to get more >>> information >>> Unexpected error: cannot connect to 'ldaps://id5.prod.mydomain.com:636': >>> ################ >>> >>> and the following on the master: >>> >>> ################ >>> [14/Nov/2017:10:05:28.671905981 +0000] - INFO - NSMMReplicationPlugin >>> - repl5_tot_run - Beginning total update of replica >>> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)". >>> [14/Nov/2017:10:05:38.031033860 +0000] - ERR - NSMMReplicationPlugin - >>> repl5_tot_log_operation_failure - agmt="cn=meToid5.prod.mydomain.com" >>> (id5:389): Received error -1 (Can't contact LDAP server): for total >>> update operation >>> [14/Nov/2017:10:05:38.032272148 +0000] - ERR - NSMMReplicationPlugin - >>> release_replica - agmt="cn=meToid5.prod.mydomain.com" (id5:389): >>> Unable to send endReplication extended operation (Can't contact LDAP >>> server) >>> [14/Nov/2017:10:05:38.095893236 +0000] - ERR - NSMMReplicationPlugin - >>> repl5_tot_run - Total update failed for replica >>> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)", error (-11) >>> [14/Nov/2017:10:05:38.113388624 +0000] - INFO - NSMMReplicationPlugin >>> - bind_and_check_pwp - agmt="cn=meToid5.prod.mydomain.com" (id5:389): >>> Replication bind with GSSAPI auth resumed >>> [14/Nov/2017:10:05:38.425682940 +0000] - WARN - NSMMReplicationPlugin >>> - repl5_inc_run - agmt="cn=meToid5.prod.mydomain.com" (id5:389): The >>> remote replica has a different database generation ID than the local >>> database. You may have to reinitialize the remote replica, or the >>> local replica. >>> ################ >>> >>> I've checked the firewalls on both machines, and gone as far as to >>> flush all the iptables rules to get it to work. No luck. >>> >>> I'm also getting hundreds of the last line "different database >>> generation ID" but my understanding is that this is only logged >>> because the replica is yet to be set up. >>> >>> Would anyone please be able to provide some guidance? I've been at >>> this for a few days now! >>> >>> Thanks! >>> MIke >>> _______________________________________________ >>> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org >>> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org >> >> >> -- >> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, >> Commercial register: Amtsgericht Muenchen, HRB 153243, >> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, >> Eric Shander >> _______________________________________________ >> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org >> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org _______________________________________________ FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org