On 5/21/15 5:20 AM, thierry bordaz wrote:
On 05/21/2015 01:36 PM, Janelle wrote:
On 5/20/15 7:53 AM, Mark Reynolds wrote:
On 05/20/2015 10:17 AM, thierry bordaz wrote:
On 05/20/2015 03:46 PM, Janelle wrote:
On 5/20/15 6:01 AM, thierry bordaz wrote:
On 05/20/2015 02:57 AM, Janelle wrote:
On 5/19/15 12:04 AM, thierry bordaz wrote:
On 05/19/2015 03:42 AM, Janelle wrote:
On 5/18/15 6:23 PM, Janelle wrote:
Once again, replication/sync has been lost. I really wish the
product was more stable, it is so much potential and yet.
Servers running for 6 days no issues. No new accounts or
changes (maybe a few users changing passwords) and again, 5
out of 16 servers are no longer in sync.
I can test it easily by adding an account and then waiting a
few minutes, then run "ipa user-show --all username" on all
the servers, and only a few of them have the account. I have
now waited 15 minutes, still no luck.
Oh well.. I guess I will go look at alternatives. I had such
high hopes for this tool. Thanks so much everyone for all
your help in trying to get things stable, but for whatever
reason, there is a random loss of sync among the servers and
obviously this is not acceptable.
regards
~J
All the replicas are happy again. I found these again:
unable to decode {replica 16} 55356472000300100000
55356472000300100000
unable to decode {replica 23} 5553e3a3000000170000
55543240000300170000
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000
What I also found to be interesting is that I have not deleted
any masters at all, so this was quite perplexing where the
orphaned entries came from. However I did find 3 of the replicas
did not show complete RUV lists... While most of the replicas
had a list of all 16 servers, a couple of them listed only 4 or
5. (using ipa-replica-manage list-ruv)
I don't know about the orphaned entries. Did you get entries
below deleted parents ?
AFAIK all replicas are master and so have an entry {replica
<rid>} in the RUV. We should expect all servers having the same
number of RUVelements (16, 4 or 5). The servers with 4 or 5 may
be isolated so that they did not received updates from those with
16 RUVelements.
would you copy/paste an example of RUV with 16 and with 4-5 ?
Now, the steps to clear this were:
Removed the "unable to decode" with the direct ldapmodify's. This
worked across all replicas, which was nice and did not have to be
repeated in each one. In other words, entered on a single server,
and it was removed on all.
Hello,
Did you do direct ldapmodify onto the RUV entry
(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,SUFFIX) , clean RUV ?
Thierry,
Janelle just manually added a cleanallruv task (that I had
recommended the other week).
Mark
dc1-ipa1 and dc1-ipa2 are missing some RUVelement. If you do an
update on dc3-ipa1, is it replicated to dc1-ipa[12] ?
Also there are duplicated RID (9, 25) for dc1-ipa2.example.com:389.
You may see some messages like 'attrlist_replace' in some error logs.
25 seems to be the new RID.
thanks
thierry
re-initialized --from=good server on the ones with the short list.
Waited 5 minutes to let everything settle, then started running
tests of adds/deletes which seemed to be just fine.
Here are 2 of the DCs
-------------------------------------
Node dc1-ipa1
-------------------------------------
dc4-ipa4.example.com 389 21
dc1-ipa1.example.com 389 10
dc1-ipa4.example.com 389 4
-------------------------------------
Node dc1-ipa2
-------------------------------------
dc4-ipa4.example.com 389 21
dc1-ipa1.example.com 389 10
dc1-ipa2.example.com 389 25
dc1-ipa3.example.com 389 8
dc1-ipa4.example.com 389 4
-------------------------------------
Node dc1-ipa3
-------------------------------------
dc3-ipa1.example.com 389 14
dc3-ipa2.example.com 389 13
dc3-ipa3.example.com 389 12
dc3-ipa4.example.com 389 11
dc2-ipa1.example.com 389 7
dc2-ipa2.example.com 389 6
dc2-ipa3.example.com 389 5
dc2-ipa4.example.com 389 3
dc4-ipa1.example.com 389 18
dc4-ipa2.example.com 389 19
dc4-ipa3.example.com 389 20
dc4-ipa4.example.com 389 21
dc1-ipa1.example.com 389 10
dc1-ipa2.example.com 389 25
dc1-ipa2.example.com 389 9
dc1-ipa3.example.com 389 8
dc1-ipa4.example.com 389 4
unable to decode {replica 16} 55356472000300100000
55356472000300100000
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000
dc5-ipa1.example.com 389 26
dc5-ipa2.example.com 389 15
dc5-ipa3.example.com 389 17
-------------------------------------
Node dc1-ipa4
-------------------------------------
dc3-ipa1.example.com 389 14
dc3-ipa2.example.com 389 13
dc3-ipa3.example.com 389 12
dc3-ipa4.example.com 389 11
dc2-ipa1.example.com 389 7
dc2-ipa2.example.com 389 6
dc2-ipa3.example.com 389 5
dc2-ipa4.example.com 389 3
dc4-ipa1.example.com 389 18
dc4-ipa2.example.com 389 19
dc4-ipa3.example.com 389 20
dc4-ipa4.example.com 389 21
dc1-ipa1.example.com 389 10
dc1-ipa2.example.com 389 25
dc1-ipa2.example.com 389 9
dc1-ipa3.example.com 389 8
dc1-ipa4.example.com 389 4
unable to decode {replica 16} 55356472000300100000
55356472000300100000
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000
dc5-ipa1.example.com 389 26
dc5-ipa2.example.com 389 15
dc5-ipa3.example.com 389 17
-------------------------------------
Node dc2-ipa1
-------------------------------------
dc3-ipa1.example.com 389 14
dc3-ipa2.example.com 389 13
dc3-ipa3.example.com 389 12
dc3-ipa4.example.com 389 11
dc2-ipa1.example.com 389 7
dc2-ipa2.example.com 389 6
dc2-ipa3.example.com 389 5
dc2-ipa4.example.com 389 3
dc4-ipa1.example.com 389 18
dc4-ipa2.example.com 389 19
dc4-ipa3.example.com 389 20
dc4-ipa4.example.com 389 21
dc1-ipa1.example.com 389 10
dc1-ipa2.example.com 389 25
dc1-ipa2.example.com 389 9
dc1-ipa3.example.com 389 8
dc1-ipa4.example.com 389 4
unable to decode {replica 16} 55356472000300100000
55356472000300100000
unable to decode {replica 23} 5553e3a3000000170000
55543240000300170000
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000
dc5-ipa1.example.com 389 26
dc5-ipa2.example.com 389 15
dc5-ipa3.example.com 389 17
-------------------------------------
Node dc2-ipa2
-------------------------------------
dc3-ipa1.example.com 389 14
dc3-ipa2.example.com 389 13
dc3-ipa3.example.com 389 12
dc3-ipa4.example.com 389 11
dc2-ipa1.example.com 389 7
dc2-ipa2.example.com 389 6
dc2-ipa3.example.com 389 5
dc2-ipa4.example.com 389 3
dc4-ipa1.example.com 389 18
dc4-ipa2.example.com 389 19
dc4-ipa3.example.com 389 20
dc4-ipa4.example.com 389 21
dc1-ipa1.example.com 389 10
dc1-ipa2.example.com 389 25
dc1-ipa2.example.com 389 9
dc1-ipa3.example.com 389 8
dc1-ipa4.example.com 389 4
unable to decode {replica 16} 55356472000300100000
55356472000300100000
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000
dc5-ipa1.example.com 389 26
dc5-ipa2.example.com 389 15
dc5-ipa3.example.com 389 17
-------------------------------------
Node dc2-ipa3
-------------------------------------
dc3-ipa1.example.com 389 14
dc3-ipa2.example.com 389 13
dc3-ipa3.example.com 389 12
dc3-ipa4.example.com 389 11
dc2-ipa1.example.com 389 7
dc2-ipa2.example.com 389 6
dc2-ipa3.example.com 389 5
dc2-ipa4.example.com 389 3
dc4-ipa1.example.com 389 18
dc4-ipa2.example.com 389 19
dc4-ipa3.example.com 389 20
dc4-ipa4.example.com 389 21
dc1-ipa1.example.com 389 10
dc1-ipa2.example.com 389 25
dc1-ipa2.example.com 389 9
dc1-ipa3.example.com 389 8
dc1-ipa4.example.com 389 4
unable to decode {replica 16} 55356472000300100000
55356472000300100000
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000
dc5-ipa1.example.com 389 26
dc5-ipa2.example.com 389 15
dc5-ipa3.example.com 389 17
-------------------------------------
Node dc2-ipa4
-------------------------------------
dc3-ipa1.example.com 389 14
dc3-ipa2.example.com 389 13
dc3-ipa3.example.com 389 12
dc3-ipa4.example.com 389 11
dc2-ipa1.example.com 389 7
dc2-ipa2.example.com 389 6
dc2-ipa3.example.com 389 5
dc2-ipa4.example.com 389 3
dc4-ipa1.example.com 389 18
dc4-ipa2.example.com 389 19
dc4-ipa3.example.com 389 20
dc4-ipa4.example.com 389 21
dc1-ipa1.example.com 389 10
dc1-ipa2.example.com 389 25
dc1-ipa2.example.com 389 9
dc1-ipa3.example.com 389 8
dc1-ipa4.example.com 389 4
unable to decode {replica 16} 55356472000300100000
55356472000300100000
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000
dc5-ipa1.example.com 389 26
dc5-ipa2.example.com 389 15
dc5-ipa3.example.com 389 17
Happy Wednesday
~Janelle
And just like that - for no reason, they all reappeared:
unable to decode {replica 16} 55356472000300100000 55356472000300100000
unable to decode {replica 23} 5545d61f000200170000 5552f718000300170000
unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000
:-(
~J
Hello Janelle,
Those 3 RIDs were already present in Node dc2-ipa1, correct ? They
reappeared on others nodes as well ?
May be ds2-ipa1 established a replication session with its peers and
send those RIDs.
Could you track in all the access logs, when the op
csn=5552f718000300170000 was applied.
Note that the two hexa values of replica 23 changed
(5545d61f000200170000 5552f718000300170000 vs 5553e3a3000000170000
55543240000300170000). Have you recreated a replica 23 ?.
Do you have replication logging enabled ?
thanks
thierry
As I mentioned in the email I just sent and to be clear - NOTHING
changed in the environment. No new replicas. No changes in the servers
at all other than some simple add and deletes of users. This just
happens randomly. In the process of trying to clean them to get back
into production, as it is causing issues, and I need production to run.
Back later once I am running again.
~Janelle
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project