Hi Timothy,
The changenumber counter is protected by a lock and we should not see
duplicate value.. except if there is a bug :-(
Retrieving the time when changenumber=112697,cn=changelog was created
and the time when you saw the error, can you see any error in operations
(access log) or in the error log ?
Or did you disabled/enable retorCL between those two times ?
regards
thiery
On 09/27/2016 12:37 AM, Timothy Geier wrote:
On Sep 26, 2016, at 4:07 PM, Timothy Geier <[email protected]
<mailto:[email protected]>> wrote:
On Sep 26, 2016, at 2:17 PM, Timothy Geier <[email protected]
<mailto:[email protected]>> wrote:
This issue started when trying to remove a user; ipa user-del showed
“operation failed” and the user was not removed. The same ipa
user-del command was performed on a replica and completed
successfully, but it was then immediately apparent that this change
did not replicate anywhere else. All of the replicas then were
re-initalized using "ipa-replica-manage re-initialize” and now the
LDAP trees/users are consistent though no further changes have been
made.
The slapd error logs are showing repeated instances of
DSRetroclPlugin - replog: an error occured while adding change
number 112697, dn = changenumber=112697,cn=changelog: Already exists.
retrocl-plugin - retrocl_postob: operation failure [68]
Package versions are
ipa-server-4.2.0-15.0.1.el7.centos.6.1.x86_64
and
389-ds-base-1.3.4.0-29.el7_2.x86_64
ipa-replica-manage list-ruv
ipa: WARNING: session memcached servers not running
unable to decode: {replica 11} 56044ef50000000b0000 56044ef50000000b0000
unable to decode: {replica 7} 561f17ba000800070000 561f17ba000800070000
unable to decode: {replica 5} 561f17bc000300050000 561f17bc000300050000
unable to decode: {replica 9} 561f17ba000a00090000 561f17ba000a00090000
unable to decode: {replica 4} 561f17ba000300040000 561f17ba000300040000
(These are likely leftovers from the previous incarnation of these
servers on a RHEL6-like setup)
ipa07:389: 16
ipa02:389: 13
ipa03:389: 14
ipa01:389: 12
ipa04:389: 15
ipa05:389: 17
Thanks much,
After not taking any action, this error has stopped but has been
replaced with
[26/Sep/2016:15:54:54 -0500] NSMMReplicationPlugin -
agmt="cn=meToipa03" (ipa03:389): Missing data encountered
[26/Sep/2016:15:54:54 -0500] NSMMReplicationPlugin -
agmt="cn=meToipa03" (ipa03:389): Incremental update failed and
requires administrator action
for all of the replicas and things are slightly out of sync everywhere.
Is the best course of action here to declare one a new master and do
a ipa-replica-manage re-initialize to all of the others from that one?
After doing some testing, that’s exactly what we did and replication
is now working again. It is odd that the DSRetroclPlugin errors
stopped on their own (after approximately 3 hours); the only action
taken there was looking at the cn=changelog base using ldapvi to see
what number it was on but that has to be a sheer coincidence;
absolutely no changes were made.
We’re also still unsure what caused this; our best theory at the
moment is a race condition where everything that could have gone wrong
at that exact moment did..is there any validity to this?
Thanks,
"This message and any attachments may contain confidential information. If you
have received this message in error, any use or distribution is prohibited.
Please notify us by reply e-mail if you have mistakenly received this message,
and immediately and permanently delete it and any attachments. Thank you."
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project