On 5/21/15 5:20 AM, thierry bordaz wrote:
On 05/21/2015 01:36 PM, Janelle wrote:
On 5/20/15 7:53 AM, Mark Reynolds wrote:


On 05/20/2015 10:17 AM, thierry bordaz wrote:
On 05/20/2015 03:46 PM, Janelle wrote:
On 5/20/15 6:01 AM, thierry bordaz wrote:
On 05/20/2015 02:57 AM, Janelle wrote:
On 5/19/15 12:04 AM, thierry bordaz wrote:
On 05/19/2015 03:42 AM, Janelle wrote:
On 5/18/15 6:23 PM, Janelle wrote:
Once again, replication/sync has been lost. I really wish the product was more stable, it is so much potential and yet.

Servers running for 6 days no issues. No new accounts or changes (maybe a few users changing passwords) and again, 5 out of 16 servers are no longer in sync.

I can test it easily by adding an account and then waiting a few minutes, then run "ipa user-show --all username" on all the servers, and only a few of them have the account. I have now waited 15 minutes, still no luck.

Oh well.. I guess I will go look at alternatives. I had such high hopes for this tool. Thanks so much everyone for all your help in trying to get things stable, but for whatever reason, there is a random loss of sync among the servers and obviously this is not acceptable.

regards
~J


All the replicas are happy again. I found these again:

unable to decode {replica 16} 55356472000300100000 55356472000300100000 unable to decode {replica 23} 5553e3a3000000170000 55543240000300170000 unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000

What I also found to be interesting is that I have not deleted any masters at all, so this was quite perplexing where the orphaned entries came from. However I did find 3 of the replicas did not show complete RUV lists... While most of the replicas had a list of all 16 servers, a couple of them listed only 4 or 5. (using ipa-replica-manage list-ruv)
I don't know about the orphaned entries. Did you get entries below deleted parents ?

AFAIK all replicas are master and so have an entry {replica <rid>} in the RUV. We should expect all servers having the same number of RUVelements (16, 4 or 5). The servers with 4 or 5 may be isolated so that they did not received updates from those with 16 RUVelements.
would you copy/paste an example of RUV with 16 and with 4-5 ?

Now, the steps to clear this were:

Removed the "unable to decode" with the direct ldapmodify's. This worked across all replicas, which was nice and did not have to be repeated in each one. In other words, entered on a single server, and it was removed on all.
Hello,

Did you do direct ldapmodify onto the RUV entry (nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,SUFFIX) , clean RUV ?
Thierry,

Janelle just manually added a cleanallruv task (that I had recommended the other week).

Mark

dc1-ipa1 and dc1-ipa2 are missing some RUVelement. If you do an update on dc3-ipa1, is it replicated to dc1-ipa[12] ?

Also there are duplicated RID (9, 25) for dc1-ipa2.example.com:389. You may see some messages like 'attrlist_replace' in some error logs.
25 seems to be the new RID.

thanks
thierry


re-initialized --from=good server on the ones with the short list.

Waited 5 minutes to let everything settle, then started running tests of adds/deletes which seemed to be just fine.

Here are 2 of the DCs

-------------------------------------
Node dc1-ipa1
-------------------------------------
dc4-ipa4.example.com 389  21
dc1-ipa1.example.com 389  10
dc1-ipa4.example.com 389  4
-------------------------------------
Node dc1-ipa2
-------------------------------------
dc4-ipa4.example.com 389  21
dc1-ipa1.example.com 389  10
dc1-ipa2.example.com 389  25
dc1-ipa3.example.com 389  8
dc1-ipa4.example.com 389  4
-------------------------------------
Node dc1-ipa3
-------------------------------------
dc3-ipa1.example.com 389  14
dc3-ipa2.example.com 389  13
dc3-ipa3.example.com 389  12
dc3-ipa4.example.com 389  11
dc2-ipa1.example.com 389  7
dc2-ipa2.example.com 389  6
dc2-ipa3.example.com 389  5
dc2-ipa4.example.com 389  3
dc4-ipa1.example.com 389  18
dc4-ipa2.example.com 389  19
dc4-ipa3.example.com 389  20
dc4-ipa4.example.com 389  21
dc1-ipa1.example.com 389  10
dc1-ipa2.example.com 389  25
dc1-ipa2.example.com 389  9
dc1-ipa3.example.com 389  8
dc1-ipa4.example.com 389  4
unable to decode {replica 16} 55356472000300100000 55356472000300100000 unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000
dc5-ipa1.example.com 389  26
dc5-ipa2.example.com 389  15
dc5-ipa3.example.com 389  17
-------------------------------------
Node dc1-ipa4
-------------------------------------
dc3-ipa1.example.com 389  14
dc3-ipa2.example.com 389  13
dc3-ipa3.example.com 389  12
dc3-ipa4.example.com 389  11
dc2-ipa1.example.com 389  7
dc2-ipa2.example.com 389  6
dc2-ipa3.example.com 389  5
dc2-ipa4.example.com 389  3
dc4-ipa1.example.com 389  18
dc4-ipa2.example.com 389  19
dc4-ipa3.example.com 389  20
dc4-ipa4.example.com 389  21
dc1-ipa1.example.com 389  10
dc1-ipa2.example.com 389  25
dc1-ipa2.example.com 389  9
dc1-ipa3.example.com 389  8
dc1-ipa4.example.com 389  4
unable to decode {replica 16} 55356472000300100000 55356472000300100000 unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000
dc5-ipa1.example.com 389  26
dc5-ipa2.example.com 389  15
dc5-ipa3.example.com 389  17
-------------------------------------
Node dc2-ipa1
-------------------------------------
dc3-ipa1.example.com 389  14
dc3-ipa2.example.com 389  13
dc3-ipa3.example.com 389  12
dc3-ipa4.example.com 389  11
dc2-ipa1.example.com 389  7
dc2-ipa2.example.com 389  6
dc2-ipa3.example.com 389  5
dc2-ipa4.example.com 389  3
dc4-ipa1.example.com 389  18
dc4-ipa2.example.com 389  19
dc4-ipa3.example.com 389  20
dc4-ipa4.example.com 389  21
dc1-ipa1.example.com 389  10
dc1-ipa2.example.com 389  25
dc1-ipa2.example.com 389  9
dc1-ipa3.example.com 389  8
dc1-ipa4.example.com 389  4
unable to decode {replica 16} 55356472000300100000 55356472000300100000 unable to decode {replica 23} 5553e3a3000000170000 55543240000300170000 unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000
dc5-ipa1.example.com 389  26
dc5-ipa2.example.com 389  15
dc5-ipa3.example.com 389  17
-------------------------------------
Node dc2-ipa2
-------------------------------------
dc3-ipa1.example.com 389  14
dc3-ipa2.example.com 389  13
dc3-ipa3.example.com 389  12
dc3-ipa4.example.com 389  11
dc2-ipa1.example.com 389  7
dc2-ipa2.example.com 389  6
dc2-ipa3.example.com 389  5
dc2-ipa4.example.com 389  3
dc4-ipa1.example.com 389  18
dc4-ipa2.example.com 389  19
dc4-ipa3.example.com 389  20
dc4-ipa4.example.com 389  21
dc1-ipa1.example.com 389  10
dc1-ipa2.example.com 389  25
dc1-ipa2.example.com 389  9
dc1-ipa3.example.com 389  8
dc1-ipa4.example.com 389  4
unable to decode {replica 16} 55356472000300100000 55356472000300100000 unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000
dc5-ipa1.example.com 389  26
dc5-ipa2.example.com 389  15
dc5-ipa3.example.com 389  17
-------------------------------------
Node dc2-ipa3
-------------------------------------
dc3-ipa1.example.com 389  14
dc3-ipa2.example.com 389  13
dc3-ipa3.example.com 389  12
dc3-ipa4.example.com 389  11
dc2-ipa1.example.com 389  7
dc2-ipa2.example.com 389  6
dc2-ipa3.example.com 389  5
dc2-ipa4.example.com 389  3
dc4-ipa1.example.com 389  18
dc4-ipa2.example.com 389  19
dc4-ipa3.example.com 389  20
dc4-ipa4.example.com 389  21
dc1-ipa1.example.com 389  10
dc1-ipa2.example.com 389  25
dc1-ipa2.example.com 389  9
dc1-ipa3.example.com 389  8
dc1-ipa4.example.com 389  4
unable to decode {replica 16} 55356472000300100000 55356472000300100000 unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000
dc5-ipa1.example.com 389  26
dc5-ipa2.example.com 389  15
dc5-ipa3.example.com 389  17
-------------------------------------
Node dc2-ipa4
-------------------------------------
dc3-ipa1.example.com 389  14
dc3-ipa2.example.com 389  13
dc3-ipa3.example.com 389  12
dc3-ipa4.example.com 389  11
dc2-ipa1.example.com 389  7
dc2-ipa2.example.com 389  6
dc2-ipa3.example.com 389  5
dc2-ipa4.example.com 389  3
dc4-ipa1.example.com 389  18
dc4-ipa2.example.com 389  19
dc4-ipa3.example.com 389  20
dc4-ipa4.example.com 389  21
dc1-ipa1.example.com 389  10
dc1-ipa2.example.com 389  25
dc1-ipa2.example.com 389  9
dc1-ipa3.example.com 389  8
dc1-ipa4.example.com 389  4
unable to decode {replica 16} 55356472000300100000 55356472000300100000 unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000
dc5-ipa1.example.com 389  26
dc5-ipa2.example.com 389  15
dc5-ipa3.example.com 389  17


Happy Wednesday
~Janelle





And just like that - for no reason, they all reappeared:

unable to decode  {replica 16} 55356472000300100000 55356472000300100000
unable to decode  {replica 23} 5545d61f000200170000 5552f718000300170000
unable to decode  {replica 24} 554d53d3000000180000 554d54a4000200180000

:-(
~J

Hello Janelle,

Those 3 RIDs were already present in Node dc2-ipa1, correct ? They reappeared on others nodes as well ? May be ds2-ipa1 established a replication session with its peers and send those RIDs. Could you track in all the access logs, when the op csn=5552f718000300170000 was applied.

Note that the two hexa values of replica 23 changed (5545d61f000200170000 5552f718000300170000 vs 5553e3a3000000170000 55543240000300170000). Have you recreated a replica 23 ?.

Do you have replication logging enabled ?

thanks
thierry


As I mentioned in the email I just sent and to be clear - NOTHING changed in the environment. No new replicas. No changes in the servers at all other than some simple add and deletes of users. This just happens randomly. In the process of trying to clean them to get back into production, as it is causing issues, and I need production to run. Back later once I am running again.

~Janelle
-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to