Ludwig, thank you for the prompt, helpful reply.

I've deleted the stale replication agreements, cleaned the dangling
RUVs and renamed the huge file.  It recreated the file but it's
nowhere near as big as it was.

Now, on the second issue, it doesn't appear to be listening on port 636.

The steps I'm following are, broadly:

yum install ipa-server
ipa-replica-install ./replica-info-id5.prod.mydomain.com.gpg

I did not join the replica machine as a client before initiating the
replication, I understand this is correct?

Presumably the directory starts on the replica during the
replica-install process?

journalctl on the replica shows many of the following after I try to install:
ERR - NSMMReplicationPlugin - replica_replace_ruv_tombstone - Failed
to update replication update vector for replica
dc=prod,dc=mydomain,dc=com: LDAP error - 1

This is the state of things after trying to install the replica:
[root@id5 ~]# netstat -ltnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address
State       PID/Program name
tcp        0      0 0.0.0.0:111             0.0.0.0:*
LISTEN      1/systemd
tcp        0      0 0.0.0.0:22              0.0.0.0:*
LISTEN      1139/sshd
tcp        0      0 127.0.0.1:25            0.0.0.0:*
LISTEN      1332/master
tcp6       0      0 :::111                  :::*
LISTEN      1/systemd
tcp6       0      0 :::22                   :::*
LISTEN      1139/sshd
tcp6       0      0 ::1:25                  :::*
LISTEN      1332/master
tcp6       0      0 :::389                  :::*
LISTEN      1964/ns-slapd

I note that port 389 is showing as tcp6 but I can see it with v4 from the master

What I have noticed is that the master is very, very slow.  In
particular the httpd process running under the ipaapi user is sitting
at 100% load most of the time.  I suspect timeouts may be occurring if
it's taking a long time for the master to respond to requests.

Grateful for any more guidance
Mike



On 14 November 2017 at 12:23, Ludwig Krispenz via FreeIPA-users
<freeipa-users@lists.fedorahosted.org> wrote:
>
> On 11/14/2017 11:40 AM, Mike Johnson via FreeIPA-users wrote:
>>
>> Hi
>>
>> I've got a small environment which had until recently 2 IPA servers.
>> Both CentOS 7.4.1708
>>
>> Version info:
>>
>> id1:
>> Name        : ipa-server
>> Version     : 4.5.0
>> Release     : 21.el7.centos.2.2
>> Kernel: 3.10.0-693.5.2.el7.x86_64
>> 389-ds-base is at version 1.3.6.1
>>
>> id5:
>> Name        : ipa-server
>> Version     : 4.5.0
>> Release     : 21.el7.centos.2.2
>> Kernel: 3.10.0-693.5.2.el7.x86_64
>> 389-ds-base is at version 1.3.6.1
>>
>> I recently had an issue with high IO/load, and noted that the following
>> file:
>> /var/lib/dirsrv/slapd-PROD-MYDOMAIN-COM/cldb/<long-filename>.db
>> was huge (5GB-ish) in a very small 2-master environment.  This is on
>> the master.  My understanding is that the entries in this file, which
>> have timestamps from months ago, exist because of failed replication.
>> I don't understand how to clear this without breaking things.
>
> looks like you have changelog trimming not enabled, if you enable trimming
> now this would reduce the content, but not necessary reduce the file size,
> but it would prevent it to grow.
> If you stop the server and remove it, it will be recreated. What can happen
> then is that required changes to update another replica are missing and repl
> will ask you to reinit the other server.
>
> Now, the second problem should be unrelated. Looks like total init tries to
> connect to port 636 and fails, the normal repl session fals because the init
> didn't happen. Could you verify that id5 is listening on 636 or if you have
> any errors in its error logs.
>>
>>
>> Second issue; not sure if related:
>>
>> I've since lost the replica (id2) but I've prepared a new machine
>> (id5) to be a new replica of id1.  I've cleaned the RUVs and deleted
>> the replication agreements but when I join the new machine to the
>> existing one using `ipa-replica-install` then I get the following on
>> the replica:
>>
>> ################
>> Starting replication, please wait until this has completed.
>> Update in progress, 10 seconds elapsed
>> [ldap://id1.prod.mydomain.com:389] reports: Update failed! Status:
>> [-11 connection error: Unknown connection error (-11) - Total update
>> aborted]
>>
>>    [error] RuntimeError: Failed to start replication
>> Your system may be partly configured.
>> Run /usr/sbin/ipa-server-install --uninstall to clean up.
>>
>> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall):
>> ERROR    Failed to start replication
>> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall):
>> ERROR    The ipa-replica-install command failed. See
>> /var/log/ipareplica-install.log for more information
>> [root@id5 ~]# ipa-replica-manage re-initialize --from
>> id1.prod.mydomain.com
>> Re-run /usr/sbin/ipa-replica-manage with --verbose option to get more
>> information
>> Unexpected error: cannot connect to 'ldaps://id5.prod.mydomain.com:636':
>> ################
>>
>> and the following on the master:
>>
>> ################
>> [14/Nov/2017:10:05:28.671905981 +0000] - INFO - NSMMReplicationPlugin
>> - repl5_tot_run - Beginning total update of replica
>> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)".
>> [14/Nov/2017:10:05:38.031033860 +0000] - ERR - NSMMReplicationPlugin -
>> repl5_tot_log_operation_failure - agmt="cn=meToid5.prod.mydomain.com"
>> (id5:389): Received error -1 (Can't contact LDAP server):  for total
>> update operation
>> [14/Nov/2017:10:05:38.032272148 +0000] - ERR - NSMMReplicationPlugin -
>> release_replica - agmt="cn=meToid5.prod.mydomain.com" (id5:389):
>> Unable to send endReplication extended operation (Can't contact LDAP
>> server)
>> [14/Nov/2017:10:05:38.095893236 +0000] - ERR - NSMMReplicationPlugin -
>> repl5_tot_run - Total update failed for replica
>> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)", error (-11)
>> [14/Nov/2017:10:05:38.113388624 +0000] - INFO - NSMMReplicationPlugin
>> - bind_and_check_pwp - agmt="cn=meToid5.prod.mydomain.com" (id5:389):
>> Replication bind with GSSAPI auth resumed
>> [14/Nov/2017:10:05:38.425682940 +0000] - WARN - NSMMReplicationPlugin
>> - repl5_inc_run - agmt="cn=meToid5.prod.mydomain.com" (id5:389): The
>> remote replica has a different database generation ID than the local
>> database.  You may have to reinitialize the remote replica, or the
>> local replica.
>> ################
>>
>> I've checked the firewalls on both machines, and gone as far as to
>> flush all the iptables rules to get it to work.  No luck.
>>
>> I'm also getting hundreds of the last line "different database
>> generation ID" but my understanding is that this is only logged
>> because the replica is yet to be set up.
>>
>> Would anyone please be able to provide some guidance?  I've been at
>> this for a few days now!
>>
>> Thanks!
>> MIke
>> _______________________________________________
>> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
>> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
>
>
> --
> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
> Commercial register: Amtsgericht Muenchen, HRB 153243,
> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill,
> Eric Shander
> _______________________________________________
> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org

Reply via email to