Pastebin for dirsrv/errors log file during/after failed join --
https://pastebin.com/gJR1SZWZ

On 14 November 2017 at 16:40, Mike Johnson <m.d.john...@kuub.org> wrote:
> Ludwig, thank you for the prompt, helpful reply.
>
> I've deleted the stale replication agreements, cleaned the dangling
> RUVs and renamed the huge file.  It recreated the file but it's
> nowhere near as big as it was.
>
> Now, on the second issue, it doesn't appear to be listening on port 636.
>
> The steps I'm following are, broadly:
>
> yum install ipa-server
> ipa-replica-install ./replica-info-id5.prod.mydomain.com.gpg
>
> I did not join the replica machine as a client before initiating the
> replication, I understand this is correct?
>
> Presumably the directory starts on the replica during the
> replica-install process?
>
> journalctl on the replica shows many of the following after I try to install:
> ERR - NSMMReplicationPlugin - replica_replace_ruv_tombstone - Failed
> to update replication update vector for replica
> dc=prod,dc=mydomain,dc=com: LDAP error - 1
>
> This is the state of things after trying to install the replica:
> [root@id5 ~]# netstat -ltnp
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address
> State       PID/Program name
> tcp        0      0 0.0.0.0:111             0.0.0.0:*
> LISTEN      1/systemd
> tcp        0      0 0.0.0.0:22              0.0.0.0:*
> LISTEN      1139/sshd
> tcp        0      0 127.0.0.1:25            0.0.0.0:*
> LISTEN      1332/master
> tcp6       0      0 :::111                  :::*
> LISTEN      1/systemd
> tcp6       0      0 :::22                   :::*
> LISTEN      1139/sshd
> tcp6       0      0 ::1:25                  :::*
> LISTEN      1332/master
> tcp6       0      0 :::389                  :::*
> LISTEN      1964/ns-slapd
>
> I note that port 389 is showing as tcp6 but I can see it with v4 from the 
> master
>
> What I have noticed is that the master is very, very slow.  In
> particular the httpd process running under the ipaapi user is sitting
> at 100% load most of the time.  I suspect timeouts may be occurring if
> it's taking a long time for the master to respond to requests.
>
> Grateful for any more guidance
> Mike
>
>
>
> On 14 November 2017 at 12:23, Ludwig Krispenz via FreeIPA-users
> <freeipa-users@lists.fedorahosted.org> wrote:
>>
>> On 11/14/2017 11:40 AM, Mike Johnson via FreeIPA-users wrote:
>>>
>>> Hi
>>>
>>> I've got a small environment which had until recently 2 IPA servers.
>>> Both CentOS 7.4.1708
>>>
>>> Version info:
>>>
>>> id1:
>>> Name        : ipa-server
>>> Version     : 4.5.0
>>> Release     : 21.el7.centos.2.2
>>> Kernel: 3.10.0-693.5.2.el7.x86_64
>>> 389-ds-base is at version 1.3.6.1
>>>
>>> id5:
>>> Name        : ipa-server
>>> Version     : 4.5.0
>>> Release     : 21.el7.centos.2.2
>>> Kernel: 3.10.0-693.5.2.el7.x86_64
>>> 389-ds-base is at version 1.3.6.1
>>>
>>> I recently had an issue with high IO/load, and noted that the following
>>> file:
>>> /var/lib/dirsrv/slapd-PROD-MYDOMAIN-COM/cldb/<long-filename>.db
>>> was huge (5GB-ish) in a very small 2-master environment.  This is on
>>> the master.  My understanding is that the entries in this file, which
>>> have timestamps from months ago, exist because of failed replication.
>>> I don't understand how to clear this without breaking things.
>>
>> looks like you have changelog trimming not enabled, if you enable trimming
>> now this would reduce the content, but not necessary reduce the file size,
>> but it would prevent it to grow.
>> If you stop the server and remove it, it will be recreated. What can happen
>> then is that required changes to update another replica are missing and repl
>> will ask you to reinit the other server.
>>
>> Now, the second problem should be unrelated. Looks like total init tries to
>> connect to port 636 and fails, the normal repl session fals because the init
>> didn't happen. Could you verify that id5 is listening on 636 or if you have
>> any errors in its error logs.
>>>
>>>
>>> Second issue; not sure if related:
>>>
>>> I've since lost the replica (id2) but I've prepared a new machine
>>> (id5) to be a new replica of id1.  I've cleaned the RUVs and deleted
>>> the replication agreements but when I join the new machine to the
>>> existing one using `ipa-replica-install` then I get the following on
>>> the replica:
>>>
>>> ################
>>> Starting replication, please wait until this has completed.
>>> Update in progress, 10 seconds elapsed
>>> [ldap://id1.prod.mydomain.com:389] reports: Update failed! Status:
>>> [-11 connection error: Unknown connection error (-11) - Total update
>>> aborted]
>>>
>>>    [error] RuntimeError: Failed to start replication
>>> Your system may be partly configured.
>>> Run /usr/sbin/ipa-server-install --uninstall to clean up.
>>>
>>> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall):
>>> ERROR    Failed to start replication
>>> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall):
>>> ERROR    The ipa-replica-install command failed. See
>>> /var/log/ipareplica-install.log for more information
>>> [root@id5 ~]# ipa-replica-manage re-initialize --from
>>> id1.prod.mydomain.com
>>> Re-run /usr/sbin/ipa-replica-manage with --verbose option to get more
>>> information
>>> Unexpected error: cannot connect to 'ldaps://id5.prod.mydomain.com:636':
>>> ################
>>>
>>> and the following on the master:
>>>
>>> ################
>>> [14/Nov/2017:10:05:28.671905981 +0000] - INFO - NSMMReplicationPlugin
>>> - repl5_tot_run - Beginning total update of replica
>>> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)".
>>> [14/Nov/2017:10:05:38.031033860 +0000] - ERR - NSMMReplicationPlugin -
>>> repl5_tot_log_operation_failure - agmt="cn=meToid5.prod.mydomain.com"
>>> (id5:389): Received error -1 (Can't contact LDAP server):  for total
>>> update operation
>>> [14/Nov/2017:10:05:38.032272148 +0000] - ERR - NSMMReplicationPlugin -
>>> release_replica - agmt="cn=meToid5.prod.mydomain.com" (id5:389):
>>> Unable to send endReplication extended operation (Can't contact LDAP
>>> server)
>>> [14/Nov/2017:10:05:38.095893236 +0000] - ERR - NSMMReplicationPlugin -
>>> repl5_tot_run - Total update failed for replica
>>> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)", error (-11)
>>> [14/Nov/2017:10:05:38.113388624 +0000] - INFO - NSMMReplicationPlugin
>>> - bind_and_check_pwp - agmt="cn=meToid5.prod.mydomain.com" (id5:389):
>>> Replication bind with GSSAPI auth resumed
>>> [14/Nov/2017:10:05:38.425682940 +0000] - WARN - NSMMReplicationPlugin
>>> - repl5_inc_run - agmt="cn=meToid5.prod.mydomain.com" (id5:389): The
>>> remote replica has a different database generation ID than the local
>>> database.  You may have to reinitialize the remote replica, or the
>>> local replica.
>>> ################
>>>
>>> I've checked the firewalls on both machines, and gone as far as to
>>> flush all the iptables rules to get it to work.  No luck.
>>>
>>> I'm also getting hundreds of the last line "different database
>>> generation ID" but my understanding is that this is only logged
>>> because the replica is yet to be set up.
>>>
>>> Would anyone please be able to provide some guidance?  I've been at
>>> this for a few days now!
>>>
>>> Thanks!
>>> MIke
>>> _______________________________________________
>>> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
>>> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
>>
>>
>> --
>> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill,
>> Eric Shander
>> _______________________________________________
>> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
>> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org

Reply via email to