Hi Timothy,

The changenumber counter is protected by a lock and we should not see duplicate value.. except if there is a bug :-(


Retrieving the time when changenumber=112697,cn=changelog was created and the time when you saw the error, can you see any error in operations (access log) or in the error log ?

Or did you disabled/enable retorCL between those two times ?

regards
thiery



On 09/27/2016 12:37 AM, Timothy Geier wrote:

On Sep 26, 2016, at 4:07 PM, Timothy Geier <tge...@accertify.com <mailto:tge...@accertify.com>> wrote:


On Sep 26, 2016, at 2:17 PM, Timothy Geier <tge...@accertify.com <mailto:tge...@accertify.com>> wrote:

This issue started when trying to remove a user; ipa user-del showed “operation failed” and the user was not removed. The same ipa user-del command was performed on a replica and completed successfully, but it was then immediately apparent that this change did not replicate anywhere else. All of the replicas then were re-initalized using "ipa-replica-manage re-initialize” and now the LDAP trees/users are consistent though no further changes have been made.

The slapd error logs are showing repeated instances of

DSRetroclPlugin - replog: an error occured while adding change number 112697, dn = changenumber=112697,cn=changelog: Already exists.
retrocl-plugin - retrocl_postob: operation failure [68]

Package versions are
ipa-server-4.2.0-15.0.1.el7.centos.6.1.x86_64
and
389-ds-base-1.3.4.0-29.el7_2.x86_64

ipa-replica-manage list-ruv
ipa: WARNING: session memcached servers not running
unable to decode: {replica 11} 56044ef50000000b0000 56044ef50000000b0000
unable to decode: {replica 7} 561f17ba000800070000 561f17ba000800070000
unable to decode: {replica 5} 561f17bc000300050000 561f17bc000300050000
unable to decode: {replica 9} 561f17ba000a00090000 561f17ba000a00090000
unable to decode: {replica 4} 561f17ba000300040000 561f17ba000300040000
(These are likely leftovers from the previous incarnation of these servers on a RHEL6-like setup)
ipa07:389: 16
ipa02:389: 13
ipa03:389: 14
ipa01:389: 12
ipa04:389: 15
ipa05:389: 17

Thanks much,

After not taking any action, this error has stopped but has been replaced with

[26/Sep/2016:15:54:54 -0500] NSMMReplicationPlugin - agmt="cn=meToipa03" (ipa03:389): Missing data encountered [26/Sep/2016:15:54:54 -0500] NSMMReplicationPlugin - agmt="cn=meToipa03" (ipa03:389): Incremental update failed and requires administrator action

for all of the replicas and things are slightly out of sync everywhere.

Is the best course of action here to declare one a new master and do a ipa-replica-manage re-initialize to all of the others from that one?



After doing some testing, that’s exactly what we did and replication is now working again. It is odd that the DSRetroclPlugin errors stopped on their own (after approximately 3 hours); the only action taken there was looking at the cn=changelog base using ldapvi to see what number it was on but that has to be a sheer coincidence; absolutely no changes were made.

We’re also still unsure what caused this; our best theory at the moment is a race condition where everything that could have gone wrong at that exact moment did..is there any validity to this?

Thanks,
"This message and any attachments may contain confidential information. If you
have received this  message in error, any use or distribution is prohibited.
Please notify us by reply e-mail if you have mistakenly received this message,
and immediately and permanently delete it and any attachments. Thank you."



-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to