On 11/15/2017 04:55 PM, Mike Johnson wrote:
Thank you Ludwig.  I did ask on #389 on freenode. The first response I
got said lkrispen (presumably you) you were the expert in this area.
:-)

I have since cleaned up some nsTombstone/nsds5ReplConflict records
according to the docs:
https://access.redhat.com/documentation/en-us/red_hat_directory_server/9.0/html/administration_guide/managing_replication-solving_common_replication_conflicts

This allowed me to raise the domain level on the master to 1.

I'l revert to a clean snapshot of the replica and capture logs from both sides.
Mike
I looked into the data you sent (off-list) and it looks like you really have a problem with a large entry (or maybe more). In the consumer error log we see again:

[15/Nov/2017:18:03:47.578017800 +0000] - ERR - sasl_io_start_packet - SASL encrypted packet length exceeds maximum allowed limit (length=16777279, limit=2097152). Change the nsslapd-maxsasliosize attribute in cn=config to increase limit.

and in the corresponding access log:

[15/Nov/2017:18:03:46.868648393 +0000] conn=5 op=510 EXT oid="2.16.840.1.113730.3.5.6" name="replication-multimaster-extop" [15/Nov/2017:18:03:46.868737845 +0000] conn=5 op=510 RESULT err=0 tag=120 nentries=0 etime=0 [15/Nov/2017:18:03:46.868854476 +0000] conn=5 op=511 EXT oid="2.16.840.1.113730.3.5.6" name="replication-multimaster-extop" [15/Nov/2017:18:03:46.868924086 +0000] conn=5 op=511 RESULT err=0 tag=120 nentries=0 etime=0 [15/Nov/2017:18:03:47.579925711 +0000] conn=5 op=-1 fd=64 closed - The value requested is too large to be stored in the data buffer provided.

so the total init was progressing and 506 entries were successfully sent.

You can try to confirm that there is a lareg entry or try to find the largest entry in the database and then follow the suggestion and raise nsslapd-maxsasliosize, maybe you will also run into the limit of maxbersize then.

To see the order total init sends entries you can do the following search
ldapsearch -D cn=directory manager -w ... -b "<your suffix>" "parentid>=1"


On 15 November 2017 at 15:17, Ludwig Krispenz via FreeIPA-users
<freeipa-users@lists.fedorahosted.org> wrote:
On 11/15/2017 07:40 AM, Mike Johnson via FreeIPA-users wrote:
I should add that I deleted/moved the large DB file as it was on the
single remaining master, with no replication agreements left.
yes, but that should be unrelated.

Is it worth asking on the 389-users list as well?
you can d othis to get anotehr audience, but I think you also need feedback
from the IPA people.

The basic failure seems to be the failure of teh total init, and that seems
to fail because of:
[14/Nov/2017:16:18:51.936433927 +0000] - ERR - sasl_io_start_packet - SASL
encrypted packet length exceeds maximum allowed limit (length=16777279,
limit=2097152).  Change the nsslapd-maxsasliosize attribute in cn=config to
increase limit.

now you can try to increase the settings and retry the reinit, but if it is
in the replica install phase I do not know if there is a way to change the
default during install.

For the next occurrence, could you provide access and error logs from both
instances for the time of failure

Regards,
Ludwig

Thanks
Mike

On 14 November 2017 at 16:48, Mike Johnson <m.d.john...@kuub.org> wrote:
Pastebin for dirsrv/errors log file during/after failed join --
https://pastebin.com/gJR1SZWZ

On 14 November 2017 at 16:40, Mike Johnson <m.d.john...@kuub.org> wrote:
Ludwig, thank you for the prompt, helpful reply.

I've deleted the stale replication agreements, cleaned the dangling
RUVs and renamed the huge file.  It recreated the file but it's
nowhere near as big as it was.

Now, on the second issue, it doesn't appear to be listening on port 636.

The steps I'm following are, broadly:

yum install ipa-server
ipa-replica-install ./replica-info-id5.prod.mydomain.com.gpg

I did not join the replica machine as a client before initiating the
replication, I understand this is correct?

Presumably the directory starts on the replica during the
replica-install process?

journalctl on the replica shows many of the following after I try to
install:
ERR - NSMMReplicationPlugin - replica_replace_ruv_tombstone - Failed
to update replication update vector for replica
dc=prod,dc=mydomain,dc=com: LDAP error - 1

This is the state of things after trying to install the replica:
[root@id5 ~]# netstat -ltnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address
State       PID/Program name
tcp        0      0 0.0.0.0:111             0.0.0.0:*
LISTEN      1/systemd
tcp        0      0 0.0.0.0:22              0.0.0.0:*
LISTEN      1139/sshd
tcp        0      0 127.0.0.1:25            0.0.0.0:*
LISTEN      1332/master
tcp6       0      0 :::111                  :::*
LISTEN      1/systemd
tcp6       0      0 :::22                   :::*
LISTEN      1139/sshd
tcp6       0      0 ::1:25                  :::*
LISTEN      1332/master
tcp6       0      0 :::389                  :::*
LISTEN      1964/ns-slapd

I note that port 389 is showing as tcp6 but I can see it with v4 from
the master

What I have noticed is that the master is very, very slow.  In
particular the httpd process running under the ipaapi user is sitting
at 100% load most of the time.  I suspect timeouts may be occurring if
it's taking a long time for the master to respond to requests.

Grateful for any more guidance
Mike



On 14 November 2017 at 12:23, Ludwig Krispenz via FreeIPA-users
<freeipa-users@lists.fedorahosted.org> wrote:
On 11/14/2017 11:40 AM, Mike Johnson via FreeIPA-users wrote:
Hi

I've got a small environment which had until recently 2 IPA servers.
Both CentOS 7.4.1708

Version info:

id1:
Name        : ipa-server
Version     : 4.5.0
Release     : 21.el7.centos.2.2
Kernel: 3.10.0-693.5.2.el7.x86_64
389-ds-base is at version 1.3.6.1

id5:
Name        : ipa-server
Version     : 4.5.0
Release     : 21.el7.centos.2.2
Kernel: 3.10.0-693.5.2.el7.x86_64
389-ds-base is at version 1.3.6.1

I recently had an issue with high IO/load, and noted that the
following
file:
/var/lib/dirsrv/slapd-PROD-MYDOMAIN-COM/cldb/<long-filename>.db
was huge (5GB-ish) in a very small 2-master environment.  This is on
the master.  My understanding is that the entries in this file, which
have timestamps from months ago, exist because of failed replication.
I don't understand how to clear this without breaking things.
looks like you have changelog trimming not enabled, if you enable
trimming
now this would reduce the content, but not necessary reduce the file
size,
but it would prevent it to grow.
If you stop the server and remove it, it will be recreated. What can
happen
then is that required changes to update another replica are missing and
repl
will ask you to reinit the other server.

Now, the second problem should be unrelated. Looks like total init
tries to
connect to port 636 and fails, the normal repl session fals because the
init
didn't happen. Could you verify that id5 is listening on 636 or if you
have
any errors in its error logs.

Second issue; not sure if related:

I've since lost the replica (id2) but I've prepared a new machine
(id5) to be a new replica of id1.  I've cleaned the RUVs and deleted
the replication agreements but when I join the new machine to the
existing one using `ipa-replica-install` then I get the following on
the replica:

################
Starting replication, please wait until this has completed.
Update in progress, 10 seconds elapsed
[ldap://id1.prod.mydomain.com:389] reports: Update failed! Status:
[-11 connection error: Unknown connection error (-11) - Total update
aborted]

     [error] RuntimeError: Failed to start replication
Your system may be partly configured.
Run /usr/sbin/ipa-server-install --uninstall to clean up.

ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall):
ERROR    Failed to start replication
ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall):
ERROR    The ipa-replica-install command failed. See
/var/log/ipareplica-install.log for more information
[root@id5 ~]# ipa-replica-manage re-initialize --from
id1.prod.mydomain.com
Re-run /usr/sbin/ipa-replica-manage with --verbose option to get more
information
Unexpected error: cannot connect to
'ldaps://id5.prod.mydomain.com:636':
################

and the following on the master:

################
[14/Nov/2017:10:05:28.671905981 +0000] - INFO - NSMMReplicationPlugin
- repl5_tot_run - Beginning total update of replica
"agmt="cn=meToid5.prod.mydomain.com" (id5:389)".
[14/Nov/2017:10:05:38.031033860 +0000] - ERR - NSMMReplicationPlugin -
repl5_tot_log_operation_failure - agmt="cn=meToid5.prod.mydomain.com"
(id5:389): Received error -1 (Can't contact LDAP server):  for total
update operation
[14/Nov/2017:10:05:38.032272148 +0000] - ERR - NSMMReplicationPlugin -
release_replica - agmt="cn=meToid5.prod.mydomain.com" (id5:389):
Unable to send endReplication extended operation (Can't contact LDAP
server)
[14/Nov/2017:10:05:38.095893236 +0000] - ERR - NSMMReplicationPlugin -
repl5_tot_run - Total update failed for replica
"agmt="cn=meToid5.prod.mydomain.com" (id5:389)", error (-11)
[14/Nov/2017:10:05:38.113388624 +0000] - INFO - NSMMReplicationPlugin
- bind_and_check_pwp - agmt="cn=meToid5.prod.mydomain.com" (id5:389):
Replication bind with GSSAPI auth resumed
[14/Nov/2017:10:05:38.425682940 +0000] - WARN - NSMMReplicationPlugin
- repl5_inc_run - agmt="cn=meToid5.prod.mydomain.com" (id5:389): The
remote replica has a different database generation ID than the local
database.  You may have to reinitialize the remote replica, or the
local replica.
################

I've checked the firewalls on both machines, and gone as far as to
flush all the iptables rules to get it to work.  No luck.

I'm also getting hundreds of the last line "different database
generation ID" but my understanding is that this is only logged
because the replica is yet to be set up.

Would anyone please be able to provide some guidance?  I've been at
this for a few days now!

Thanks!
MIke
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to
freeipa-users-le...@lists.fedorahosted.org

--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael
O'Neill,
Eric Shander
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to
freeipa-users-le...@lists.fedorahosted.org
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org

--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill,
Eric Shander
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org

--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric 
Shander
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org

Reply via email to