On 10/27/2014 10:12 AM, Craig White wrote:
*From:*[email protected]
[mailto:[email protected]] *On Behalf Of *Rich Megginson
*Sent:* Monday, October 27, 2014 6:42 AM
*To:* [email protected]
*Subject:* Re: [Freeipa-users] multi-master replication
On 10/25/2014 06:17 PM, Dmitri Pal wrote:
On 10/24/2014 07:15 PM, Craig White wrote:
*From:*[email protected]
<mailto:[email protected]>
[mailto:[email protected]] *On Behalf Of *Craig
White
*Sent:* Friday, October 24, 2014 4:02 PM
*To:* [email protected] <mailto:[email protected]>
*Subject:* [Freeipa-users] multi-master replication
I would have thought that changes go from replica to master
and not just master to replica.
Is there something I have to do to make the changes
bi-directional?
Replying to my own post…
Logs are my friend ;-)
[24/Oct/2014:23:08:17 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.domain.local " (ipa001:389): Replication
bind with GSSAPI auth resumed
[24/Oct/2014:23:08:17 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.domain.local " (ipa001:389): Warning:
unable to replicate schema: rc=2
[24/Oct/2014:23:08:17 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.domain.local " (ipa001:389): Failed to
send update operation to consumer (uniqueid
e018060f-5bb011e4-81078979-dc802980, CSN
544aa346000000030000): Can't contact LDAP server. Will retry
later.
[24/Oct/2014:23:08:17 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.domain.local " (ipa001:389): Consumer
failed to replay change (uniqueid (null), CSN (null)): Can't
contact LDAP server(-1). Will retry later.
These NULLs look suspicious.
I hope DS gurus will have more for you on Monday.
1) Yes, replication is fully bi-directional.
2) What are the exact versions of dirsrv? rpm -q 389-ds-base on
supplier and consumer.
3) Can you reproduce the problem using the replication log level on
both the supplier and consumer?
http://www.port389.org/docs/389ds/FAQ/faq.html#troubleshooting
----
1)I sort of expected it to be – that’s what had me worried when it
didn’t work as expected.
2)]# rpm -qa | grep 389
389-ds-base-1.2.11.15-47.el6.x86_64
389-ds-base-libs-1.2.11.15-47.el6.x86_64
3)Upped the log levels – hopefully this is a reasonable representative
from ipa002 (the replica)
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - conn=4 op=229791
repl="dc=stt,dc=local": Acquired replica
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - conn=4 op=229791
repl="dc=stt,dc=local": StartNSDS90ReplicationRequest: response=0 rc=0
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - conn=4 op=229791
Relinquishing consumer connection extension
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
ruv_add_csn_inprogress: successfully inserted csn 544e6b02000000040000
into pending list
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - Purged state
information from entry
krbprincipalname=ldap/ipa002.FQDN@FQDN,cn=services,cn=accounts,dc=stt,dc=local
up to CSN 5445307e000000040000
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - changelog program
- _cl5GetDBFileByReplicaName: found DB object f32b80 for database
/var/lib/dirsrv/slapd-STT-LOCAL/cldb/dababc08-5bb011e4-81078979-dc802980_544aa332000000040000.db4
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - changelog program
- _cl5GetDBFileByReplicaName: found DB object f32b80 for database
/var/lib/dirsrv/slapd-STT-LOCAL/cldb/dababc08-5bb011e4-81078979-dc802980_544aa332000000040000.db4
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - ruv_update_ruv:
successfully committed csn 544e6b02000000040000
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): Disconnected from the consumer
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): Warning: unable to replicate
schema: rc=2
[27/Oct/2014:15:55:47 +0000] - csngen_adjust_time: gen state before
544e6b040001:1414425347:0:1
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - changelog program
- _cl5GetDBFile: found DB object f32b80 for database
/var/lib/dirsrv/slapd-STT-LOCAL/cldb/dababc08-5bb011e4-81078979-dc802980_544aa332000000040000.db4
[27/Oct/2014:15:55:47 +0000] - _cl5PositionCursorForReplay
(agmt="cn=meToipa001.FQDN" (ipa001:389)): Consumer RUV:
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): {replicageneration}
544aa332000000040000
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): {replica 4
ldap://ipa001.FQDN:389} 544aa3a1000000040000 544e6b02000000040000 00000000
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): {replica 3
ldap://ipa002.FQDN:389} 544aa339000000030000 544aa33f000400030000 00000000
[27/Oct/2014:15:55:47 +0000] - _cl5PositionCursorForReplay
(agmt="cn=meToipa001.FQDN" (ipa001:389)): Supplier RUV:
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): {replicageneration}
544aa332000000040000
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): {replica 3
ldap://ipa002.FQDN:389} 544aa339000000030000 544e6aa4000300030000 544e6aa3
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): {replica 4
ldap://ipa001.FQDN:389} 544aa3a1000000040000 544e6b02000000040000 544e6b03
[27/Oct/2014:15:55:47 +0000] agmt="cn=meToipa001.FQDN" (ipa001:389) -
clcache_get_buffer: found thread private buffer cache 7fe85000ea50
[27/Oct/2014:15:55:47 +0000] agmt="cn=meToipa001.FQDN" (ipa001:389) -
clcache_get_buffer: _pool is f89af0 _pool->pl_busy_lists is
7fe8500197d0 _pool->pl_busy_lists->bl_buffers is 7fe85000ea50
[27/Oct/2014:15:55:47 +0000] agmt="cn=meToipa001.FQDN" (ipa001:389) -
session start: anchorcsn=544aa33f000400030000
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - changelog program
- agmt="cn=meToipa001.FQDN" (ipa001:389): CSN 544aa33f000400030000
found, position set for replay
[27/Oct/2014:15:55:47 +0000] agmt="cn=meToipa001.FQDN" (ipa001:389) -
load=1 rec=1 csn=544aa346000000030000
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): replay_update: Sending add
operation
(dn="cn=KDC,cn=ipa002.FQDN,cn=masters,cn=ipa,cn=etc,dc=stt,dc=local"
csn=544aa346000000030000)
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): replay_update: Consumer could
not replay operation with csn 544aa346000000030000
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): Failed to send update
operation to consumer (uniqueid e018060f-5bb011e4-81078979-dc802980,
CSN 544aa346000000030000): Can't contact LDAP server. Will retry later.
[27/Oct/2014:15:55:47 +0000] - repl5_inc_result_threadmain starting
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): Connection disconnected by
another thread
[27/Oct/2014:15:55:47 +0000] - repl5_inc_result_threadmain: read
result for message_id 0
[27/Oct/2014:15:55:47 +0000] - repl5_inc_result_threadmain: result 3,
-1, 2, 0, (null)
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): Consumer failed to replay
change (uniqueid (null), CSN (null)): Can't contact LDAP server(-1).
Will retry later.
[27/Oct/2014:15:55:47 +0000] - repl5_inc_result_threadmain: got op
result 205 should finish 1
[27/Oct/2014:15:55:47 +0000] - repl5_inc_result_threadmain exiting
[27/Oct/2014:15:55:47 +0000] agmt="cn=meToipa001.FQDN" (ipa001:389) -
session end: state=0 load=1 sent=1 skipped=0 skipped_new_rid=0
skipped_csn_gt_cons_maxcsn=0 skipped_up_to_date=0 skipped_csn_gt_ruv=0
skipped_csn_covered=0
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): Warning: unable to send
endReplication extended operation (Can't contact LDAP server)
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): State: sending_updates ->
start_backoff
[27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): State: start_backoff ->
start_backoff
[27/Oct/2014:15:55:49 +0000] NSMMReplicationPlugin - conn=4 op=229793
Acquired consumer connection extension
[27/Oct/2014:15:55:49 +0000] NSMMReplicationPlugin - conn=4 op=229793
repl="dc=stt,dc=local": Released replica held by locking_purl=conn=4
id=229791
[27/Oct/2014:15:55:49 +0000] NSMMReplicationPlugin - conn=4 op=229793
Relinquishing consumer connection extension
[27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): State: start_backoff -> backoff
[27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): Trying non-secure
slapi_ldap_init_ext
[27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): binddn = , passwd =
[27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): Replication bind with GSSAPI
auth resumed
[27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): No linger to cancel on the
connection
[27/Oct/2014:15:55:51 +0000] - _csngen_adjust_local_time: gen state
before 544e6b040001:1414425347:0:1
[27/Oct/2014:15:55:51 +0000] - _csngen_adjust_local_time: gen state
after 544e6b080000:1414425351:0:1
[27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): Replica was successfully acquired.
[27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin -
agmt="cn=meToipa001.FQDN" (ipa001:389): State: backoff -> sending_updates
[27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin - conn=4 op=229794
Acquired consumer connection extension
[27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin - conn=4 op=229794
repl="dc=stt,dc=local": Begin incremental protocol
[27/Oct/2014:15:55:51 +0000] - csngen_adjust_time: gen state before
544e6b080001:1414425351:0:1
[27/Oct/2014:15:55:51 +0000] - csngen_adjust_time: gen state after
544e6b080001:1414425351:0:1
This is the log from ipa002 (which was the replica) and I am confused
by the error logged, ‘Can’t contact LDAP server’ because it surely can…
]# telnet 172.29.31.100 389
Trying 172.29.31.100...
Connected to 172.29.31.100.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
Replication uses Kerberos/GSSAPI and also possibly startTLS (I can't
remember if ipa uses startTLS for an additional layer of encryption, or
just uses Kerberos/GSSAPI for encryption). These might also cause the
directory server to report connection failures even if the tcp layer
connections are working.
Can you provided excerpts from the 172.29.31.100 directory server access
logs from around the time of the connection failures reported above?
This is a new setup (setup master on Thursday, imported users from
OpenLDAP and then setup replica on Friday). Had a few issues that I
didn’t understand which may or may not be relevant. First, after I
made some changes to SSSD to allow for sudo and hopefully HBAC, it
seemed like the directory server crashed and I had to restart it.
Crashed? If you can reproduce this problem, see
http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-crashes
If there is a crash, we need to see the stack trace.
Seemed like the same thing happened at the same time when I likewise
configured the replica.
Again, see above.
Also, when I was trying to setup the replica, I had to resort to
–skip-conncheck as it (presume Kerberos) not only asked for the
password for admin@$DOMAIN but also the password for
admin@ipa001.$DOMAIN <mailto:admin@ipa001.$DOMAIN> for which I had
never specifically setup any password and no password already used in
Kerberos worked.
Hmm - maybe one of the ipa guys knows what the problem is here.
Thanks
Craig
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go To http://freeipa.org for more info on the project