Thank you Ludwig.  I did ask on #389 on freenode. The first response I
got said lkrispen (presumably you) you were the expert in this area.

I have since cleaned up some nsTombstone/nsds5ReplConflict records
according to the docs:
https://access.redhat.com/documentation/en-us/red_hat_directory_server/9.0/html/administration_guide/managing_replication-solving_common_replication_conflicts

This allowed me to raise the domain level on the master to 1.

I'l revert to a clean snapshot of the replica and capture logs from both sides.
Mike

On 15 November 2017 at 15:17, Ludwig Krispenz via FreeIPA-users
<freeipa-users@lists.fedorahosted.org> wrote:
>
> On 11/15/2017 07:40 AM, Mike Johnson via FreeIPA-users wrote:
>>
>> I should add that I deleted/moved the large DB file as it was on the
>> single remaining master, with no replication agreements left.
>
> yes, but that should be unrelated.
>
>>
>> Is it worth asking on the 389-users list as well?
>
> you can d othis to get anotehr audience, but I think you also need feedback
> from the IPA people.
>
> The basic failure seems to be the failure of teh total init, and that seems
> to fail because of:
> [14/Nov/2017:16:18:51.936433927 +0000] - ERR - sasl_io_start_packet - SASL
> encrypted packet length exceeds maximum allowed limit (length=16777279,
> limit=2097152).  Change the nsslapd-maxsasliosize attribute in cn=config to
> increase limit.
>
> now you can try to increase the settings and retry the reinit, but if it is
> in the replica install phase I do not know if there is a way to change the
> default during install.
>
> For the next occurrence, could you provide access and error logs from both
> instances for the time of failure
>
> Regards,
> Ludwig
>
>>
>> Thanks
>> Mike
>>
>> On 14 November 2017 at 16:48, Mike Johnson <m.d.john...@kuub.org> wrote:
>>>
>>> Pastebin for dirsrv/errors log file during/after failed join --
>>> https://pastebin.com/gJR1SZWZ
>>>
>>> On 14 November 2017 at 16:40, Mike Johnson <m.d.john...@kuub.org> wrote:
>>>>
>>>> Ludwig, thank you for the prompt, helpful reply.
>>>>
>>>> I've deleted the stale replication agreements, cleaned the dangling
>>>> RUVs and renamed the huge file.  It recreated the file but it's
>>>> nowhere near as big as it was.
>>>>
>>>> Now, on the second issue, it doesn't appear to be listening on port 636.
>>>>
>>>> The steps I'm following are, broadly:
>>>>
>>>> yum install ipa-server
>>>> ipa-replica-install ./replica-info-id5.prod.mydomain.com.gpg
>>>>
>>>> I did not join the replica machine as a client before initiating the
>>>> replication, I understand this is correct?
>>>>
>>>> Presumably the directory starts on the replica during the
>>>> replica-install process?
>>>>
>>>> journalctl on the replica shows many of the following after I try to
>>>> install:
>>>> ERR - NSMMReplicationPlugin - replica_replace_ruv_tombstone - Failed
>>>> to update replication update vector for replica
>>>> dc=prod,dc=mydomain,dc=com: LDAP error - 1
>>>>
>>>> This is the state of things after trying to install the replica:
>>>> [root@id5 ~]# netstat -ltnp
>>>> Active Internet connections (only servers)
>>>> Proto Recv-Q Send-Q Local Address           Foreign Address
>>>> State       PID/Program name
>>>> tcp        0      0 0.0.0.0:111             0.0.0.0:*
>>>> LISTEN      1/systemd
>>>> tcp        0      0 0.0.0.0:22              0.0.0.0:*
>>>> LISTEN      1139/sshd
>>>> tcp        0      0 127.0.0.1:25            0.0.0.0:*
>>>> LISTEN      1332/master
>>>> tcp6       0      0 :::111                  :::*
>>>> LISTEN      1/systemd
>>>> tcp6       0      0 :::22                   :::*
>>>> LISTEN      1139/sshd
>>>> tcp6       0      0 ::1:25                  :::*
>>>> LISTEN      1332/master
>>>> tcp6       0      0 :::389                  :::*
>>>> LISTEN      1964/ns-slapd
>>>>
>>>> I note that port 389 is showing as tcp6 but I can see it with v4 from
>>>> the master
>>>>
>>>> What I have noticed is that the master is very, very slow.  In
>>>> particular the httpd process running under the ipaapi user is sitting
>>>> at 100% load most of the time.  I suspect timeouts may be occurring if
>>>> it's taking a long time for the master to respond to requests.
>>>>
>>>> Grateful for any more guidance
>>>> Mike
>>>>
>>>>
>>>>
>>>> On 14 November 2017 at 12:23, Ludwig Krispenz via FreeIPA-users
>>>> <freeipa-users@lists.fedorahosted.org> wrote:
>>>>>
>>>>> On 11/14/2017 11:40 AM, Mike Johnson via FreeIPA-users wrote:
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I've got a small environment which had until recently 2 IPA servers.
>>>>>> Both CentOS 7.4.1708
>>>>>>
>>>>>> Version info:
>>>>>>
>>>>>> id1:
>>>>>> Name        : ipa-server
>>>>>> Version     : 4.5.0
>>>>>> Release     : 21.el7.centos.2.2
>>>>>> Kernel: 3.10.0-693.5.2.el7.x86_64
>>>>>> 389-ds-base is at version 1.3.6.1
>>>>>>
>>>>>> id5:
>>>>>> Name        : ipa-server
>>>>>> Version     : 4.5.0
>>>>>> Release     : 21.el7.centos.2.2
>>>>>> Kernel: 3.10.0-693.5.2.el7.x86_64
>>>>>> 389-ds-base is at version 1.3.6.1
>>>>>>
>>>>>> I recently had an issue with high IO/load, and noted that the
>>>>>> following
>>>>>> file:
>>>>>> /var/lib/dirsrv/slapd-PROD-MYDOMAIN-COM/cldb/<long-filename>.db
>>>>>> was huge (5GB-ish) in a very small 2-master environment.  This is on
>>>>>> the master.  My understanding is that the entries in this file, which
>>>>>> have timestamps from months ago, exist because of failed replication.
>>>>>> I don't understand how to clear this without breaking things.
>>>>>
>>>>> looks like you have changelog trimming not enabled, if you enable
>>>>> trimming
>>>>> now this would reduce the content, but not necessary reduce the file
>>>>> size,
>>>>> but it would prevent it to grow.
>>>>> If you stop the server and remove it, it will be recreated. What can
>>>>> happen
>>>>> then is that required changes to update another replica are missing and
>>>>> repl
>>>>> will ask you to reinit the other server.
>>>>>
>>>>> Now, the second problem should be unrelated. Looks like total init
>>>>> tries to
>>>>> connect to port 636 and fails, the normal repl session fals because the
>>>>> init
>>>>> didn't happen. Could you verify that id5 is listening on 636 or if you
>>>>> have
>>>>> any errors in its error logs.
>>>>>>
>>>>>>
>>>>>> Second issue; not sure if related:
>>>>>>
>>>>>> I've since lost the replica (id2) but I've prepared a new machine
>>>>>> (id5) to be a new replica of id1.  I've cleaned the RUVs and deleted
>>>>>> the replication agreements but when I join the new machine to the
>>>>>> existing one using `ipa-replica-install` then I get the following on
>>>>>> the replica:
>>>>>>
>>>>>> ################
>>>>>> Starting replication, please wait until this has completed.
>>>>>> Update in progress, 10 seconds elapsed
>>>>>> [ldap://id1.prod.mydomain.com:389] reports: Update failed! Status:
>>>>>> [-11 connection error: Unknown connection error (-11) - Total update
>>>>>> aborted]
>>>>>>
>>>>>>     [error] RuntimeError: Failed to start replication
>>>>>> Your system may be partly configured.
>>>>>> Run /usr/sbin/ipa-server-install --uninstall to clean up.
>>>>>>
>>>>>> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall):
>>>>>> ERROR    Failed to start replication
>>>>>> ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall):
>>>>>> ERROR    The ipa-replica-install command failed. See
>>>>>> /var/log/ipareplica-install.log for more information
>>>>>> [root@id5 ~]# ipa-replica-manage re-initialize --from
>>>>>> id1.prod.mydomain.com
>>>>>> Re-run /usr/sbin/ipa-replica-manage with --verbose option to get more
>>>>>> information
>>>>>> Unexpected error: cannot connect to
>>>>>> 'ldaps://id5.prod.mydomain.com:636':
>>>>>> ################
>>>>>>
>>>>>> and the following on the master:
>>>>>>
>>>>>> ################
>>>>>> [14/Nov/2017:10:05:28.671905981 +0000] - INFO - NSMMReplicationPlugin
>>>>>> - repl5_tot_run - Beginning total update of replica
>>>>>> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)".
>>>>>> [14/Nov/2017:10:05:38.031033860 +0000] - ERR - NSMMReplicationPlugin -
>>>>>> repl5_tot_log_operation_failure - agmt="cn=meToid5.prod.mydomain.com"
>>>>>> (id5:389): Received error -1 (Can't contact LDAP server):  for total
>>>>>> update operation
>>>>>> [14/Nov/2017:10:05:38.032272148 +0000] - ERR - NSMMReplicationPlugin -
>>>>>> release_replica - agmt="cn=meToid5.prod.mydomain.com" (id5:389):
>>>>>> Unable to send endReplication extended operation (Can't contact LDAP
>>>>>> server)
>>>>>> [14/Nov/2017:10:05:38.095893236 +0000] - ERR - NSMMReplicationPlugin -
>>>>>> repl5_tot_run - Total update failed for replica
>>>>>> "agmt="cn=meToid5.prod.mydomain.com" (id5:389)", error (-11)
>>>>>> [14/Nov/2017:10:05:38.113388624 +0000] - INFO - NSMMReplicationPlugin
>>>>>> - bind_and_check_pwp - agmt="cn=meToid5.prod.mydomain.com" (id5:389):
>>>>>> Replication bind with GSSAPI auth resumed
>>>>>> [14/Nov/2017:10:05:38.425682940 +0000] - WARN - NSMMReplicationPlugin
>>>>>> - repl5_inc_run - agmt="cn=meToid5.prod.mydomain.com" (id5:389): The
>>>>>> remote replica has a different database generation ID than the local
>>>>>> database.  You may have to reinitialize the remote replica, or the
>>>>>> local replica.
>>>>>> ################
>>>>>>
>>>>>> I've checked the firewalls on both machines, and gone as far as to
>>>>>> flush all the iptables rules to get it to work.  No luck.
>>>>>>
>>>>>> I'm also getting hundreds of the last line "different database
>>>>>> generation ID" but my understanding is that this is only logged
>>>>>> because the replica is yet to be set up.
>>>>>>
>>>>>> Would anyone please be able to provide some guidance?  I've been at
>>>>>> this for a few days now!
>>>>>>
>>>>>> Thanks!
>>>>>> MIke
>>>>>> _______________________________________________
>>>>>> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
>>>>>> To unsubscribe send an email to
>>>>>> freeipa-users-le...@lists.fedorahosted.org
>>>>>
>>>>>
>>>>> --
>>>>> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
>>>>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>>>>> Managing Directors: Charles Cachera, Michael Cunningham, Michael
>>>>> O'Neill,
>>>>> Eric Shander
>>>>> _______________________________________________
>>>>> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
>>>>> To unsubscribe send an email to
>>>>> freeipa-users-le...@lists.fedorahosted.org
>>
>> _______________________________________________
>> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
>> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
>
>
> --
> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
> Commercial register: Amtsgericht Muenchen, HRB 153243,
> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill,
> Eric Shander
> _______________________________________________
> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org

Reply via email to