Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

Ludwig Krispenz Mon, 22 Aug 2016 02:47:10 -0700

Thanks,

I looked into the logs, I think the messages are harmless, just aneffect of csn adjustment due to time difference on the two machines. Ihad said that the replication protocol will try to adjust the csngenerator, but looks like you have long lasting replication connectionsand the adjustment is done only at the beginning. Maybe we can look intothis and improve it.

Just the tracking of one of these error messages:


the entry is modified on adm3

adm3 :[19/Aug/2016:15:47:05 -0400] conn=13 op=126637 MODdn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"adm3 :[19/Aug/2016:15:47:05 -0400] conn=13 op=126637 RESULT err=0tag=103 nentries=0 etime=0 csn=57b76303000000160000

this mod is replicated to adm0

adm0 :[19/Aug/2016:15:47:06 -0400] conn=1395 op=86121 MODdn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"adm0 :[19/Aug/2016:15:47:06 -0400] conn=1395 op=86121 RESULT err=0tag=103 nentries=0 etime=0 csn=57b76303000000160000

the entry is modified again on adm0

adm0 :[19/Aug/2016:15:47:07 -0400] conn=27 op=1108697 MODdn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"but it looks like the csn generated is smaller than the one already inthe entry, and it is adjustedadm0 :[19/Aug/2016:15:47:07 -0400] - replica_generate_next_csn:opcsn=57b76301000a00040000 <= basecsn=57b76303000000160000, adjustedopcsn=57b76303000100040000

then the result is logged with the adjusted csn

adm0 :[19/Aug/2016:15:47:07 -0400] conn=27 op=1108697 RESULT err=0tag=103 nentries=0 etime=0 csn=57b76303000100040000

so the mechanism works, but the messages may be confusing andimprovement of the protocol could be investigated.


One question I have, but someone more familiar with dns should answer:

we have regular updates of the same entry on both replicas, about every2 seconds, what is the reason for this ?

/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:03 -0400] conn=13op=126630 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:05 -0400] conn=13op=126637 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:07 -0400] conn=13op=126646 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:09 -0400] conn=13op=126653 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:13 -0400] conn=13op=126666 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:16 -0400] conn=13op=126673 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:18 -0400] conn=13op=126689 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:20 -0400] conn=13op=126696 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:21 -0400] conn=13op=126702 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:23 -0400] conn=13op=126737 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:26 -0400] conn=13op=126758 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:29 -0400] conn=13op=126801 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"




On 08/19/2016 10:00 PM, John Desantis wrote:

Ludwig,

you still only grep the replication connection, but before being replicated
the entry has to be added by some client connection, can you get all
references to the entry ?
the log snippet you provide shows also csns with tag=103, which indicate a
MOD, are these MODs for the added entries ? or other mods ?

.....

I can't believe I did that!

Ok, so the logs have been rotated (I didn't think to adjust
logrotate..), so there aren't any logs to peruse for the case I've
presented so far.  However, I was able to reproduce the errors by
"bulk" deleting 39 DNS entries, and only the MASTER reported
"replica_generate_next_csn" entries.

Given the size of the logs, I think it would be pointless to do any
kind of sanitization.  I'll go ahead and gzip them for you and email
you off-list.

I've labeled them as MASTER and REPLICA.

John DeSantis


2016-08-19 9:18 GMT-04:00 Ludwig Krispenz <[email protected]>:

On 08/18/2016 05:28 PM, John Desantis wrote:

Ludwig,

unfortunately this is not enough to determine what is going on. The
intersting generated/used csn is only logged in the
corresponding RESULT message and these are only the replication
connections,
it would be necessary to see the
original ADD operation, was it added once or twice by a client ?
you could pick one entry eg server-6-3-sp and grep for all references in
the
access logs of both servers  (maybe there are mods as well) and then
get also get the RESULT line for the ops found

Here are the updated log snippets looking for ADD and RESULT:

you still only grep the replication connection, but before being replicated
the entry has to be added by some client connection, can you get all
references to the entry ?
the log snippet you provide shows also csns with tag=103, which indicate a
MOD, are these MODs for the added entries ? or other mods ?

PROD:11:20:13-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=602.*(RESULT|ADD)' access.2016081*
access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4139
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4140
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4141
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4142
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
ADD
dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb000300040000
access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4144
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bb000400040000
access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4145
RESULT err=0 tag=103 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4146
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4147
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
ADD
dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb000800040000
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4149
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bc000100040000
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4150
RESULT err=0 tag=103 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
ADD
dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4c1000500040000
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4152
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c1000600040000

PROD:11:19:54-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=1395.*(RESULT|ADD)' access.2016081*
access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4148
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4149
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4150
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4b9000500160000
access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
ADD
dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
RESULT err=0 tag=105 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:44 -0400] conn=1395 op=4152
RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4bc000000160000
access.20160817-111940:[17/Aug/2016:13:50:46 -0400] conn=1395 op=4153
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4154
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4155
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4156
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:48 -0400] conn=1395 op=4157
RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4c1000100160000
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
ADD
dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
RESULT err=0 tag=105 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4159
RESULT err=0 tag=103 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4160
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c3000000160000

I'm positive that I was the only one performing DNS updates during
this time, and I was only using 1 console.

Thanks,
John DeSantis


2016-08-18 10:09 GMT-04:00 Ludwig Krispenz <[email protected]>:

On 08/18/2016 03:15 PM, John Desantis wrote:

Ludwig,

Thank you for your response!

This is a normal scenario, but you could check if the simultaneous
updates
on 4 and 16 are intentional.

In regards to the simultaneous updates, the only items I have noted so
far
are:

*  The time sync between the master (4) and replica (16) was off by
about 1-2 seconds, with the latter being ahead;

yes, this happens, but the replication protocol tries to handle this, in
a
replication session the supplier and consumer
exchange their ruvs and if the time differs the csn state generator is
updated with a local or remote offset so that the
generated time is always based on the most advanced clock - on all
servers.
And even if you adjust the system time, the csn
time will never go back.

*  There are continual log entries referencing
"replication-multimaster-extop" and "Netscape Replication End Session"
in the dirsrv "access" logs, and during one of the manifestations of
"replica_generate_next_csn", I found this:

PROD:08:46:08-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=602.*ADD' access.2016081*
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
ADD

dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
ADD

dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
ADD

dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"

PROD:08:47:44-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=1395.*ADD' access.2016081*
access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
ADD

dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
ADD

dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"

It looks like the entries for server-6-3-sp and 6-5-sp were referenced
twice.  Do you think that the time being off by 1-2 seconds between
the master and replica could be the issue?  The connection 602 is the
replication between the replica and master, and the connection 1395 is
the replication between the master and replica.

unfortunately this is not enough to determine what is going on. The
intersting generated/used csn is only logged in the
corresponding RESULT message and these are only the replication
connections,
it would be necessary to see the
original ADD operation, was it added once or twice by a client ?
you could pick one entry eg server-6-3-sp and grep for all references in
the
access logs of both servers  (maybe there are mods as well) and then
get also get the RESULT line for the ops found


Since I know these operations were performed using the console via a
for loop 'ipa dnsrecord-add dom.dom.dom server-6-$i-sp
--a-rec=10.250.12.$i' on one of our login nodes, do you think that
specifying an _srv_ record in the DOMAIN configuration with the
address of the master server, e.g.: ipa_server = _srv_,
MASTER.dom.dom.dom could be the issue (coupled with the time syncing)?

I know that these questions are probably leaning more towards the
389ds team, so feel free to pass me over to them if need be.

I think I can address the ds related questions, but I don't know about
console and dns to assess if the behaviour is normal

Again, thank you very much for responding!

John DeSantis

2016-08-18 4:14 GMT-04:00 Ludwig Krispenz <[email protected]>:

On 08/17/2016 08:54 PM, John Desantis wrote:

Hello all,

We've been re-using old host names and IP addresses for a new
deployment of nodes, and recently I've been seeing the messages pasted
below in the slapd-DC.DC.DC "error" log on our nodes.

[17/Aug/2016:10:30:30 -0400] - replica_generate_next_csn:
opcsn=57b475cd001200040000 <= basecsn=57b475cf000000160000, adjusted
opcsn=57b475cf000100040000
[17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
opcsn=57b47f00000200040000 <= basecsn=57b47f00000200160000, adjusted
opcsn=57b47f00000300040000
[17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
opcsn=57b47f00000400040000 <= basecsn=57b47f00000400160000, adjusted
opcsn=57b47f00000500040000
[17/Aug/2016:11:10:33 -0400] - replica_generate_next_csn:
opcsn=57b47f2f001000040000 <= basecsn=57b47f30000200160000, adjusted
opcsn=57b47f30000300040000
[17/Aug/2016:13:50:45 -0400] - replica_generate_next_csn:
opcsn=57b4a4bb000900040000 <= basecsn=57b4a4bc000000160000, adjusted
opcsn=57b4a4bc000100040000
[17/Aug/2016:13:52:54 -0400] - replica_generate_next_csn:
opcsn=57b4a53e000a00040000 <= basecsn=57b4a53f000000160000, adjusted
opcsn=57b4a53f000100040000
[17/Aug/2016:13:53:15 -0400] - replica_generate_next_csn:
opcsn=57b4a552000700040000 <= basecsn=57b4a553000000160000, adjusted
opcsn=57b4a553000100040000
[17/Aug/2016:13:53:32 -0400] - replica_generate_next_csn:
opcsn=57b4a562000900040000 <= basecsn=57b4a564000000160000, adjusted
opcsn=57b4a564000100040000

Each modification (add/del/mod) gets a csn assignged used in
replication
update resolution. And each assigned csn has to newer than an existing
one.
The messages you see is from code that double checks that the entry
doesn't
have already a lareg csn - and adjusts it.
The logs indicate that entries are more or less concurrently updated on
replica 4 and 16, and the updates from16 are received while processing
the
updates on 4.
This is a normal scenario, but you could check if the simultaneous
updates
on 4 and 16 are intentional.

They seem to only occur when updating DNS entries, whether on the
console or via the GUI (tail -f'ing the log).

A search in this mailing-list returns nothing, but a message is found
on the 389-ds list [1];  it seems to suggest that the messages aren't
fatal and are purely informational, yet if they are occurring
constantly that there could be a problem with the replication
algorithm and/or deployment.

We're using ipa-server 3.0.0-47 and 389-ds 1.2.11.15-60.  Nothing has
changed on the deployment side of things, and I don't recall seeing
this message before.

I'm wondering if it's safe to disregard these messages due to the
re-use of the entries, or if something else should be looked into.

Thank you,
John DeSantis

[1] https://fedorahosted.org/389/ticket/47959

--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael
O'Neill,
Eric Shander

--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill,
Eric Shander

--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill,
Eric Shander

--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric 
Shander

--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

Reply via email to