On 05/26/2015 08:47 AM, Martin Kosek wrote:
On 05/26/2015 12:20 AM, Janelle wrote:
On 5/24/15 3:12 AM, Janelle wrote:
And just like that, my haunted servers have all returned.
I am going to just put a gun to my head and be done with it. :-(

Why do things run perfectly and then suddenly ???
Logs show little to nothing, mostly because the servers are so busy, they
have already rotated out.

unable to decode {replica 16} 55356472000300100000 55356472000300100000 unable to decode {replica 22} 55371e9e000000160000 553eec64000400160000 unable to decode {replica 23} 5545d61f000200170000 55543240000300170000 unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000 unable to decode {replica 25} 554d78bf000000190000 555af302000400190000
unable to decode  {replica 9} 55402c39000300090000 55402c39000300090000

Don't know what to do anymore. At my wit's end..

So things are getting more interesting. Still trying to find the "leaking server(s)". here is what I mean by that. As you see, I continue to find these -- BUT, notice a new symptom -- replica 9 does NOT show any other data - it is

Hello Janelle,

Thanks for update. So you worry that there might still be the "rogue IPA replica" that would be injecting the wrong replica data?

In any case, I bet Ludwig and Thierry will follow up with your thread, there is just delay caused by the various public holidays and PTOs this week and we need to rest before digging into the fun with RUVs - as you already know yourself :-)

unable to decode  {replica 16} 55356472000300100000 55356472000300100000
unable to decode  {replica 22} 55371e9e000000160000 553eec64000400160000
unable to decode  {replica 24} 554d53d3000100180000 554d54a4000200180000
unable to decode  {replica 25} 554d78bf000200190000 555af302000400190000
unable to decode  {replica 9}

Now, if I delete these from a server using the ldapmodify method - they go away
briefly, but then if I restart the server, they come back.

Let me try to explain -- given a number of servers, say 8, if I user ldapmodify to delete from 1 of those, they seem to go away from maybe 4 of them -- but if I wait a few minutes, it is almost as though "replication" is re-adding these
bad replicas from the servers that I have NOT deleted them from.

On each replica (master/replica) there are one RUV in the database and one RUV in the changelog. When cleanallruv succeeds it clears both of them. All replica should be reachable when you issue cleanallruv, so that it can clean the RUVs on all the replicas in almost "single" operation. If some replica are not reachable, they keep information of about the cleaned RID and then can later propagate those "old" RID to the rest of the replica.

Ludwig managed to reproduce the issue with a quite complex test case (3 replicas and multiple cleanallruv). We have not yet identified the reason how a cleaned replicaId can get resurrected. In parallel we just reproduced it without a clear test case but in a 2 replica topology.

So my question is simple - is there something in the logs I can look for that would indicate the SOURCE of these bogus entries? Is the replica 9 with NO
extra data any indication of something I could look for?

I guess that if I have the answer to your question we would have understood the bug ..

I am not willing to give up easily (as you might have already guessed) and I am determined to find the cause of these. I know we need more logs, but with all
the traffic, the logs rollover within a few hours, and if the problem is
happening at 3am for example, I am not able to track it down because the logs
have rolled.

Back to my investigations.

Manage your subscription for the Freeipa-users mailing list:
Go to http://freeipa.org for more info on the project

Reply via email to