On 05/20/2015 03:25 PM, Janelle wrote:
On 5/20/15 12:54 AM, Ludwig Krispenz wrote:


On 05/20/2015 02:57 AM, Janelle wrote:
On 5/19/15 12:04 AM, thierry bordaz wrote:
On 05/19/2015 03:42 AM, Janelle wrote:
On 5/18/15 6:23 PM, Janelle wrote:
Once again, replication/sync has been lost. I really wish the product was more stable, it is so much potential and yet.

Servers running for 6 days no issues. No new accounts or changes (maybe a few users changing passwords) and again, 5 out of 16 servers are no longer in sync.

I can test it easily by adding an account and then waiting a few minutes, then run "ipa user-show --all username" on all the servers, and only a few of them have the account. I have now waited 15 minutes, still no luck.

Oh well.. I guess I will go look at alternatives. I had such high hopes for this tool. Thanks so much everyone for all your help in trying to get things stable, but for whatever reason, there is a random loss of sync among the servers and obviously this is not acceptable.

regards
~J


All the replicas are happy again. I found these again:

unable to decode  {replica 16} 55356472000300100000 55356472000300100000
unable to decode  {replica 23} 5553e3a3000000170000 55543240000300170000
unable to decode  {replica 24} 554d53d3000000180000 554d54a4000200180000

What I also found to be interesting is that I have not deleted any masters at all, so this was quite perplexing where the orphaned entries came from. However I did find 3 of the replicas did not show complete RUV lists... While most of the replicas had a list of all 16 servers, a couple of them listed only 4 or 5. (using ipa-replica-manage list-ruv)
so this happens "out of the blue" ? Did it happen at the same time, do you know when it started ? The maxcsns in the ruv are quite old: r16: apr,21, r23: may,14 r24: may,9 could it be that there was no change applied to these masters for that time ?

Indeed yes, that is a correct statement. It seems to be incredibly random. Ok, I give up - how are you finding the date in the strings? And really, is May 14th that old?
55356472000300100000 is a CSN (ChangeSequenceNumber), it is built of

hextimestamp: 55356472
sequence number: 0003 (numbering of csns generated within the sceond of the time stamp
replica id: 0010 (==16) replica, where the change was received
subsequence number: 0000 used internally if a mod consists of several sub-mods

May. 14 is not old, but would mean that there was no change on that replica for a couple of days


What is odd about the Apr 21st one, is that if you see my previous emails, I had cleaned up all of this before, so for that to "re-appear" is indeed a mystery.

As of this morning, things remain clean. What will be funny, now that I had extended logging enabled, they know we are on to them, so the servers won't fail again. :-)

~J






-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to