I should add that I deleted/moved the large DB file as it was on the
single remaining master, with no replication agreements left.

Is it worth asking on the 389-users list as well?

Thanks
Mike

On 14 November 2017 at 16:48, Mike Johnson <m.d.john...@kuub.org> wrote:
> Pastebin for dirsrv/errors log file during/after failed join --
> https://pastebin.com/gJR1SZWZ
>
> On 14 November 2017 at 16:40, Mike Johnson <m.d.john...@kuub.org> wrote:
>> Ludwig, thank you for the prompt, helpful reply.
>>
>> I've deleted the stale replication agreements, cleaned the dangling
>> RUVs and renamed the huge file.  It recreated the file but it's
>> nowhere near as big as it was.
>>
>> Now, on the second issue, it doesn't appear to be listening on port 636.
>>
>> The steps I'm following are, broadly:
>>
>> yum install ipa-server
>> ipa-replica-install ./replica-info-id5.prod.mydomain.com.gpg
>>
>> I did not join the replica machine as a client before initiating the
>> replication, I understand this is correct?
>>
>> Presumably the directory starts on the replica during the
>> replica-install process?
>>
>> journalctl on the replica shows many of the following after I try to install:
>> ERR - NSMMReplicationPlugin - replica_replace_ruv_tombstone - Failed
>> to update replication update vector for replica
>> dc=prod,dc=mydomain,dc=com: LDAP error - 1
>>
>> This is the state of things after trying to install the replica:
>> [root@id5 ~]# netstat -ltnp
>> Active Internet connections (only servers)
>> Proto Recv-Q Send-Q Local Address           Foreign Address
>> State       PID/Program name
>> tcp        0      0 0.0.0.0:111             0.0.0.0:*
>> LISTEN      1/systemd
>> tcp        0      0 0.0.0.0:22              0.0.0.0:*
>> LISTEN      1139/sshd
>> tcp        0      0 127.0.0.1:25            0.0.0.0:*
>> LISTEN      1332/master
>> tcp6       0      0 :::111                  :::*
>> LISTEN      1/systemd
>> tcp6       0      0 :::22                   :::*
>> LISTEN      1139/sshd
>> tcp6       0      0 ::1:25                  :::*
>> LISTEN      1332/master
>> tcp6       0      0 :::389                  :::*
>> LISTEN      1964/ns-slapd
>>
>> I note that port 389 is showing as tcp6 but I can see it with v4 from the 
>> master
>>
>> What I have noticed is that the master is very, very slow.  In
>> particular the httpd process running under the ipaapi user is sitting
>> at 100% load most of the time.  I suspect timeouts may be occurring if
>> it's taking a long time for the master to respond to requests.
>>
>> Grateful for any more guidance
>> Mike
>>
>>
>>
>> On 14 November 2017 at 12:23, Ludwig Krispenz via FreeIPA-users
>> <freeipa-users@lists.fedorahosted.org> wrote:
>>>
>>> On 11/14/2017 11:40 AM, Mike Johnson via FreeIPA-users wrote:
>>>>
>>>> Hi
>>>>
>>>> I've got a small environment which had until recently 2 IPA servers.
>>>> Both CentOS 7.4.1708
>>>>
>>>> Version info:
>>>>
>>>> id1:
>>>> Name        : ipa-server
>>>> Version     : 4.5.0
>>>> Release     : 21.el7.centos.2.2
>>>> Kernel: 3.10.0-693.5.2.el7.x86_64
>>>> 389-ds-base is at version 1.3.6.1
>>>>
>>>> id5:
>>>> Name        : ipa-server
>>>> Version     : 4.5.0
>>>> Release     : 21.el7.centos.2.2
>>>> Kernel: 3.10.0-693.5.2.el7.x86_64
>>>> 389-ds-base is at version 1.3.6.1
>>>>
>>>> I recently had an issue with high IO/load, and noted that the following
>>>> file:
>>>> /var/lib/dirsrv/slapd-PROD-MYDOMAIN-COM/cldb/<long-filename>.db
>>>> was huge (5GB-ish) in a very small 2-master environment.  This is on
>>>> the master.  My understanding is that the entries in this file, which
>>>> have timestamps from months ago, exist because of failed replication.
>>>> I don't understand how to clear this without breaking things.
>>>
>>> looks like you have changelog trimming not enabled, if you enable trimming
>>> now this would reduce the content, but not necessary reduce the file size,
>>> but it would prevent it to grow.
>>> If you stop the server and remove it, it will be recreated. What can happen
>>> then is that required changes to update another replica are missing and repl
>>> will ask you to reinit the other server.
>>>
>>> Now, the second problem should be unrelated. Looks like total init tries to
>>> connect to port 636 and fails, the normal repl session fals because the init
>>> didn't happen. Could you verify that id5 is listening on 636 or if you have
>>> any errors in its error logs.
>>>>
>>>>
>>>> Second issue; not sure if related:
>>>>
>>>> I've since lost the replica (id2) but I've prepared a new machine
>>>> (id5) to be a new replica of id1.  I've cleaned the RUVs and deleted
>>>> the replication agreements but when I join the new machine to the
>>>> existing one using `ipa-replica-install` then I get the following on
>>>> the replica:
>>>>
>>>> ################
>>>> Starting replication, please wait until this has completed.
>>>> Update in progress, 10 seconds elapsed
>>>> [ldap://id1.prod.mydomain.com:389] reports: Update failed! Status:
>>>> [-11 connection error: Unknown connection error (-11) - Total update
>>>> aborted]
>>>>
>>>>    [error] RuntimeError: Failed to start replication
>>>> Your system may be partly configured.
>>>> Run /usr/sbin/ipa-server-install --uninstall to clean up.
>>>>
>>>> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall):
>>>> ERROR    Failed to start replication
>>>> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall):
>>>> ERROR    The ipa-replica-install command failed. See
>>>> /var/log/ipareplica-install.log for more information
>>>> [root@id5 ~]# ipa-replica-manage re-initialize --from
>>>> id1.prod.mydomain.com
>>>> Re-run /usr/sbin/ipa-replica-manage with --verbose option to get more
>>>> information
>>>> Unexpected error: cannot connect to 'ldaps://id5.prod.mydomain.com:636':
>>>> ################
>>>>
>>>> and the following on the master:
>>>>
>>>> ################
>>>> [14/Nov/2017:10:05:28.671905981 +0000] - INFO - NSMMReplicationPlugin
>>>> - repl5_tot_run - Beginning total update of replica
>>>> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)".
>>>> [14/Nov/2017:10:05:38.031033860 +0000] - ERR - NSMMReplicationPlugin -
>>>> repl5_tot_log_operation_failure - agmt="cn=meToid5.prod.mydomain.com"
>>>> (id5:389): Received error -1 (Can't contact LDAP server):  for total
>>>> update operation
>>>> [14/Nov/2017:10:05:38.032272148 +0000] - ERR - NSMMReplicationPlugin -
>>>> release_replica - agmt="cn=meToid5.prod.mydomain.com" (id5:389):
>>>> Unable to send endReplication extended operation (Can't contact LDAP
>>>> server)
>>>> [14/Nov/2017:10:05:38.095893236 +0000] - ERR - NSMMReplicationPlugin -
>>>> repl5_tot_run - Total update failed for replica
>>>> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)", error (-11)
>>>> [14/Nov/2017:10:05:38.113388624 +0000] - INFO - NSMMReplicationPlugin
>>>> - bind_and_check_pwp - agmt="cn=meToid5.prod.mydomain.com" (id5:389):
>>>> Replication bind with GSSAPI auth resumed
>>>> [14/Nov/2017:10:05:38.425682940 +0000] - WARN - NSMMReplicationPlugin
>>>> - repl5_inc_run - agmt="cn=meToid5.prod.mydomain.com" (id5:389): The
>>>> remote replica has a different database generation ID than the local
>>>> database.  You may have to reinitialize the remote replica, or the
>>>> local replica.
>>>> ################
>>>>
>>>> I've checked the firewalls on both machines, and gone as far as to
>>>> flush all the iptables rules to get it to work.  No luck.
>>>>
>>>> I'm also getting hundreds of the last line "different database
>>>> generation ID" but my understanding is that this is only logged
>>>> because the replica is yet to be set up.
>>>>
>>>> Would anyone please be able to provide some guidance?  I've been at
>>>> this for a few days now!
>>>>
>>>> Thanks!
>>>> MIke
>>>> _______________________________________________
>>>> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
>>>> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
>>>
>>>
>>> --
>>> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
>>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>>> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill,
>>> Eric Shander
>>> _______________________________________________
>>> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
>>> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org

Reply via email to