On Wed, Jul 15, 2015 at 03:22:51PM +0200, Ludwig Krispenz wrote:
> 
> On 07/14/2015 08:59 PM, Andrew E. Bruno wrote:
> >On Tue, Jul 14, 2015 at 04:52:10PM +0200, Ludwig Krispenz wrote:
> >>hm, the stack traces show csn_str, which correspond to Jul,8th, Jul,4th, and
> >>Jul,7th - so it looks like it is iterating the changelog over and over
> >>again.
> >>Th consumer side Is "cn=meTosrv-m14-24.ccr.buffalo.edu" - is this the master
> >>?
> >>
> >>can you provide the result of the following search from
> >>m14-24.ccr.buffalo.edu adn the server with the high cpu:
> >>
> >>ldapsearch -o ldif-wrap=no -x -D ... -w  -b "cn=config"
> >>"objectclass=nsds5replica" nsds50ruv
> >
> >master is srv-m14-24.. here's the results of the ldapsearch:
> >
> >[srv-m14-24 ~]$ ldapsearch -o ldif-wrap=no -x -D "cn=directory manager" -W  
> >-b "cn=config" "objectclass=nsds5replica" nsds50ruv
> >
> ># replica, dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu, mapping tree, config
> >dn: cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu,cn=mapping 
> >tree,cn=config
> >nsds50ruv: {replicageneration} 5527f711000000040000
> >nsds50ruv: {replica 4 ldap://srv-m14-24.ccr.buffalo.edu:389} 
> >5527f771000000040000 55a55aed001000040000
> >nsds50ruv: {replica 5 ldap://srv-m14-26.ccr.buffalo.edu:389} 
> >5537c773000000050000 5591a3d2000700050000
> >nsds50ruv: {replica 6 ldap://srv-m14-25-02.ccr.buffalo.edu:389} 
> >55943dda000000060000 55945378000200060000
> so this is really strange, the master m14-24 has the latest change from
> replica 5(m14-26) as: 5591a3d2000700050000
> which corresponds to Mon, 29 Jun 2015 20:00:18 GMT
> so no update from 14-24 since that did arrive, or could not update the ruv.
> So m14-26 tries to replicate all the changes back from that time, but looks
> like iit has no success.
> is there anything in the logs of m14-24 ? can you see successful mods with
> csn=xxxxxxx00050000 ?

Here's what I could find from the logs on srv-m14-24:


[srv-m14-24 ~]# grep -r 00050000 /var/log/dirsrv/slapd-[domain]/*
access.20150714-014346:[14/Jul/2015:03:10:05 -0400] conn=748529 op=14732 RESULT 
err=0 tag=103 nentries=0 etime=1 csn=55a4b5f0005000040000


And here's the last few lines the error log on srv-m14-24:

[12/Jul/2015:10:11:14 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2456070,cn=changelog!!
[12/Jul/2015:10:11:48 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2498441,cn=changelog!!
[13/Jul/2015:07:41:49 -0400] NSMMReplicationPlugin - 
agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" 
(srv-m14-26:389): Consumer failed to replay change (uniqueid 
cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a3a406000000600000): Operations 
error (1). Will retry later.
[13/Jul/2015:11:56:50 -0400] NSMMReplicationPlugin - 
agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" 
(srv-m14-26:389): Consumer failed to replay change (uniqueid 
cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a3dfca000000600000): Operations 
error (1). Will retry later.
[13/Jul/2015:14:26:50 -0400] NSMMReplicationPlugin - 
agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" 
(srv-m14-26:389): Consumer failed to replay change (uniqueid 
cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a402f2000000600000): Operations 
error (1). Will retry later.
[13/Jul/2015:15:26:49 -0400] NSMMReplicationPlugin - 
agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" 
(srv-m14-26:389): Consumer failed to replay change (uniqueid 
cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a41102000000600000): Operations 
error (1). Will retry later.
[13/Jul/2015:18:26:51 -0400] NSMMReplicationPlugin - 
agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" 
(srv-m14-26:389): Consumer failed to replay change (uniqueid 
cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a43b32000000600000): Operations 
error (1). Will retry later.
[13/Jul/2015:18:56:51 -0400] NSMMReplicationPlugin - 
agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" 
(srv-m14-26:389): Consumer failed to replay change (uniqueid 
cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a4423a000000600000): Operations 
error (1). Will retry later.
[13/Jul/2015:20:41:51 -0400] NSMMReplicationPlugin - 
agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" 
(srv-m14-26:389): Consumer failed to replay change (uniqueid 
cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a45ad6000000600000): Operations 
error (1). Will retry later.
[13/Jul/2015:22:41:51 -0400] NSMMReplicationPlugin - 
agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" 
(srv-m14-26:389): Consumer failed to replay change (uniqueid 
cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a476f6000000600000): Operations 
error (1). Will retry later.
[14/Jul/2015:06:56:51 -0400] NSMMReplicationPlugin - 
agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" 
(srv-m14-26:389): Consumer failed to replay change (uniqueid 
cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a4eafa000000600000): Operations 
error (1). Will retry later.
[14/Jul/2015:09:56:52 -0400] NSMMReplicationPlugin - 
agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" 
(srv-m14-26:389): Consumer failed to replay change (uniqueid 
cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a5152a000000600000): Operations 
error (1). Will retry later.
[14/Jul/2015:10:11:21 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2552223,cn=changelog!!
[14/Jul/2015:10:11:21 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2552224,cn=changelog!!
[14/Jul/2015:10:11:25 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2557315,cn=changelog!!
[14/Jul/2015:10:11:25 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2557318,cn=changelog!!
[14/Jul/2015:10:11:28 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2561020,cn=changelog!!
[14/Jul/2015:10:11:28 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2561043,cn=changelog!!
[14/Jul/2015:10:11:48 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2586022,cn=changelog!!
[14/Jul/2015:10:11:59 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2598989,cn=changelog!!
[14/Jul/2015:10:11:59 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2598990,cn=changelog!!
[14/Jul/2015:10:12:01 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2600966,cn=changelog!!
[14/Jul/2015:10:12:03 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2604037,cn=changelog!!
[14/Jul/2015:10:12:04 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2604054,cn=changelog!!
[14/Jul/2015:10:12:24 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2629803,cn=changelog!!
[14/Jul/2015:10:12:24 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No 
original_tombstone for changenumber=2629804,cn=changelog!!





> >
> ># replica, o\3Dipaca, mapping tree, config
> >dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
> >nsds50ruv: {replicageneration} 5527f74b000000600000
> >nsds50ruv: {replica 96 ldap://srv-m14-24.ccr.buffalo.edu:389} 
> >5527f754000000600000 55a557f6000000600000
> >nsds50ruv: {replica 86 ldap://srv-m14-25-02.ccr.buffalo.edu:389} 
> >55943e6e000000560000 55943e6f000100560000
> >nsds50ruv: {replica 91 ldap://srv-m14-26.ccr.buffalo.edu:389} 
> >5537c7ba0000005b0000 5582c7e40004005b0000
> >
> >
> >server with high cpu load is srv-m14-26. here's the results of the ldapsearch
> >from this server:
> >
> >[srv-m14-26 ~]$ ldapsearch -o ldif-wrap=no -x -D "cn=directory manager" -W  
> >-b "cn=config" "objectclass=nsds5replica" nsds50ruv
> >
> ># replica, dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu, mapping tree, config
> >dn: cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu,cn=mapping 
> >tree,cn=config
> >nsds50ruv: {replicageneration} 5527f711000000040000
> >nsds50ruv: {replica 5 ldap://srv-m14-26.ccr.buffalo.edu:389} 
> >5537c773000000050000 55a55b47000300050000
> >nsds50ruv: {replica 4 ldap://srv-m14-24.ccr.buffalo.edu:389} 
> >5527f771000000040000 55a53eb0000a00040000
> >nsds50ruv: {replica 6 ldap://srv-m14-25-02.ccr.buffalo.edu:389} 
> >55943dda000000060000 55945378000200060000
> >
> ># replica, o\3Dipaca, mapping tree, config
> >dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
> >nsds50ruv: {replicageneration} 5527f74b000000600000
> >nsds50ruv: {replica 91 ldap://srv-m14-26.ccr.buffalo.edu:389} 
> >5537c7ba0000005b0000 5582c7e40004005b0000
> >nsds50ruv: {replica 96 ldap://srv-m14-24.ccr.buffalo.edu:389} 
> >5527f754000000600000 55a557f6000000600000
> >nsds50ruv: {replica 86 ldap://srv-m14-25-02.ccr.buffalo.edu:389} 
> >55943e6e000000560000 55943e6f000100560000
> >
> >
> >srv-m14-25-02 is our 3rd replicate which we recently added back in after it
> >failed (was added back in 7/1).
> >
> >Let me know if you need anything else. Thanks for the help.
> >
> >--Andrew
> >
> >>On 07/14/2015 02:35 PM, Andrew E. Bruno wrote:
> >>>On Tue, Jul 14, 2015 at 01:41:57PM +0200, Ludwig Krispenz wrote:
> >>>>On 07/13/2015 06:36 PM, Andrew E. Bruno wrote:
> >>>>>On Mon, Jul 13, 2015 at 05:29:13PM +0200, Ludwig Krispenz wrote:
> >>>>>>On 07/13/2015 05:05 PM, Andrew E. Bruno wrote:
> >>>>>>>On Mon, Jul 13, 2015 at 04:58:46PM +0200, Ludwig Krispenz wrote:
> >>>>>>>>can you get a pstack of the slapd process along with a top -H to find 
> >>>>>>>>th
> >>>>>>>>ethread with high cpu usage
> >>>>>>>Attached is the full stacktrace of the running ns-slapd proccess. top 
> >>>>>>>-H
> >>>>>>>shows this thread (2879) with high cpu usage:
> >>>>>>>
> >>>>>>>2879 dirsrv    20   0 3819252 1.962g  11680 R 99.9  3.1   8822:10 
> >>>>>>>ns-slapd
> >>>>>>this thread is a replication thread sending updates, what is strange is 
> >>>>>>that
> >>>>>>the current csn_str is quite old (july, 7th), I can't tell which 
> >>>>>>agreeement
> >>>>>>this thread is handling, but looks like it is heavily reading the 
> >>>>>>changeglog
> >>>>>>and sending updates. anything changed recently in replication setup ?
> >>>>>Yes, we had one replica fail on (6/19) which we removed (not this one
> >>>>>showing high CPU load). Had to perform some manual cleanup of the ipa-ca
> >>>>>RUVs. Then we added the replica back in on 7/1. Since then, replication
> >>>>>appears to have been running normally between the 3 replicas. We've been
> >>>>>monitoring utilization since 7/1 and only recently seen this spike (past
> >>>>>24 hours or so).
> >>>>is it still in this state ? or was it a spike.
> >>>Yes same state.
> >>>
> >>>>if it still is high cpu consuming, could you
> >>>>- get a few pstack like the one before with some time in between, I would
> >>>>like to see if it is progressing with the csns or looping on the same one
> >>>Attached are a few stacktraces. The thread pegging the cpu is:
> >>>
> >>>PID  USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+  COMMAND
> >>>2879 dirsrv    20   0 3819252 1.978g  11684 R  99.9  3.2  10148:26  
> >>>ns-slapd
> >>>
> >>>>- check the consumer side. is there anything in the error log ? does the
> >>>>access log show replication activity from this server
> >>>Here's some errors showing up on the first master server rep1 (rep2 is the
> >>>server with pegged cpu):
> >>>
> >>>[13/Jul/2015:20:41:51 -0400] NSMMReplicationPlugin - 
> >>>agmt="cn=masterAgreement1-rep2-pki-tomcat" (rep2:389): Consumer failed to 
> >>>replay change (uniqueid cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 
> >>>55a45ad6000000600000): Operations error (1). Will retry later.
> >>>[13/Jul/2015:22:41:51 -0400] NSMMReplicationPlugin - 
> >>>agmt="cn=masterAgreement1-rep2-pki-tomcat" (rep2:389): Consumer failed to 
> >>>replay change (uniqueid cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 
> >>>55a476f6000000600000): Operations error (1). Will retry later.
> >>>[14/Jul/2015:06:56:51 -0400] NSMMReplicationPlugin - 
> >>>agmt="cn=masterAgreement1-rep2-tomcat" (rep2:389): Consumer failed to 
> >>>replay change (uniqueid cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 
> >>>55a4eafa000000600000): Operations error (1). Will retry later.
> >>>
> >>>
> >>>Here's some snips from the access log of the rep2:
> >>>
> >>>
> >>>[14/Jul/2015:08:22:31 -0400] conn=777787 op=9794 EXT 
> >>>oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session"
> >>>[14/Jul/2015:08:22:31 -0400] conn=777787 op=9794 RESULT err=0 tag=120 
> >>>nentries=0 etime=0
> >>>[14/Jul/2015:08:22:31 -0400] conn=777787 op=9795 EXT 
> >>>oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
> >>>[14/Jul/2015:08:22:31 -0400] conn=777787 op=9795 RESULT err=0 tag=120 
> >>>nentries=0 etime=0
> >>>[14/Jul/2015:08:22:33 -0400] conn=777787 op=9796 EXT 
> >>>oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session"
> >>>..
> >>>[14/Jul/2015:08:23:38 -0400] conn=782341 op=129 RESULT err=0 tag=103 
> >>>nentries=0 etime=0 csn=55a4ff6c000000050000
> >>>..
> >>>[14/Jul/2015:08:24:02 -0400] conn=781901 op=1745 RESULT err=0 tag=101 
> >>>nentries=1 etime=0
> >>>[14/Jul/2015:08:24:03 -0400] conn=777787 op=9810 EXT 
> >>>oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session"
> >>>[14/Jul/2015:08:24:03 -0400] conn=777787 op=9810 RESULT err=0 tag=120 
> >>>nentries=0 etime=0
> >>>[14/Jul/2015:08:24:03 -0400] conn=777787 op=9811 EXT 
> >>>oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
> >>>[14/Jul/2015:08:24:03 -0400] conn=777787 op=9811 RESULT err=0 tag=120 
> >>>nentries=0 etime=0
> >>>[14/Jul/2015:08:24:05 -0400] conn=777787 op=9812 EXT 
> >>>oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session"
> >>>[14/Jul/2015:08:24:05 -0400] conn=777787 op=9812 RESULT err=0 tag=120 
> >>>nentries=0 etime=0
> >>>[14/Jul/2015:08:24:08 -0400] conn=777787 op=9813 EXT 
> >>>oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
> >>>[14/Jul/2015:08:24:08 -0400] conn=777787 op=9813 RESULT err=0 tag=120 
> >>>nentries=0 etime=0
> >>>[14/Jul/2015:08:24:10 -0400] conn=777787 op=9814 EXT 
> >>>oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session"
> >>>[14/Jul/2015:08:24:10 -0400] conn=777787 op=9814 RESULT err=0 tag=120 
> >>>nentries=0 etime=0
> >>>[14/Jul/2015:08:24:10 -0400] conn=777787 op=9815 EXT 
> >>>oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
> >>>[14/Jul/2015:08:24:10 -0400] conn=777787 op=9815 RESULT err=0 tag=120 
> >>>nentries=0 etime=0
> >>>
> >>>and here's some from the error log:
> >>>
> >>>[13/Jul/2015:22:41:49 -0400] DSRetroclPlugin - replog: an error occured 
> >>>while adding change number 2294859, dn = 
> >>>changenumber=2294859,cn=changelog: Operations error.
> >>>[13/Jul/2015:22:41:49 -0400] retrocl-plugin - retrocl_postob: operation 
> >>>failure [1]
> >>>[13/Jul/2015:23:56:50 -0400] - dn2entry_ext: Failed to get id for 
> >>>changenumber=2296384,cn=changelog from entryrdn index (-30993)
> >>>[13/Jul/2015:23:56:50 -0400] - Operation error fetching 
> >>>changenumber=2296384,cn=changelog (null), error -30993.
> >>>[13/Jul/2015:23:56:50 -0400] DSRetroclPlugin - replog: an error occured 
> >>>while adding change number 2296384, dn = 
> >>>changenumber=2296384,cn=changelog: Operations error.
> >>>[13/Jul/2015:23:56:50 -0400] retrocl-plugin - retrocl_postob: operation 
> >>>failure [1]
> >>>[14/Jul/2015:06:56:50 -0400] - dn2entry_ext: Failed to get id for 
> >>>changenumber=2304418,cn=changelog from entryrdn index (-30993)
> >>>[14/Jul/2015:06:56:50 -0400] - Operation error fetching 
> >>>changenumber=2304418,cn=changelog (null), error -30993.
> >>>[14/Jul/2015:06:56:50 -0400] DSRetroclPlugin - replog: an error occured 
> >>>while adding change number 2304418, dn = 
> >>>changenumber=2304418,cn=changelog: Operations error.
> >>>[14/Jul/2015:06:56:50 -0400] retrocl-plugin - retrocl_postob: operation 
> >>>failure [1]
> >>>
> >>>
> >>>>- eventually enable replication logging: nsslapd-errorlog-level: 8192
> >>>>>On a side note, we get hit with this bug often:
> >>>>>
> >>>>>https://www.redhat.com/archives/freeipa-users/2015-July/msg00018.html
> >>>>>
> >>>>>(rouge sssd_be processing hammering a replica).
> >>>>>
> >>>>>This causes high ns-slapd utilization on the replica and restarting sssd
> >>>>>on the client host immediately fixes the issue. However, in this
> >>>>>case, we're not seeing this behavior.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>On 07/13/2015 04:46 PM, Andrew E. Bruno wrote:
> >>>>>>>>>We have 3 freeipa-replicas. Centos 7.1.1503, ipa-server 4.1.0-18, and
> >>>>>>>>>389-ds 1.3.3.1-16.
> >>>>>>>>>
> >>>>>>>>>Recently, the ns-slapd process on one of our replicas started 
> >>>>>>>>>showing higher
> >>>>>>>>>than normal CPU usage. ns-slapd is pegged at high CPU usage more or 
> >>>>>>>>>less
> >>>>>>>>>constantly.
> >>>>>>>>>
> >>>>>>>>>Seems very similar to this thread:
> >>>>>>>>>https://www.redhat.com/archives/freeipa-users/2015-February/msg00281.html
> >>>>>>>>>
> >>>>>>>>>There are a few errors showing in 
> >>>>>>>>>/var/log/dirsrv/slapd-[domain]/errors (not
> >>>>>>>>>sure if these are related):
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>[13/Jul/2015:02:56:49 -0400] retrocl-plugin - retrocl_postob: 
> >>>>>>>>>operation failure [1]
> >>>>>>>>>[13/Jul/2015:04:11:50 -0400] - dn2entry_ext: Failed to get id for 
> >>>>>>>>>changenumber=2277387,cn=changelog from entryrdn index (-30993)
> >>>>>>>>>[13/Jul/2015:04:11:50 -0400] - Operation error fetching 
> >>>>>>>>>changenumber=2277387,cn=changelog (null), error -30993.
> >>>>>>>>>[13/Jul/2015:04:11:50 -0400] DSRetroclPlugin - replog: an error 
> >>>>>>>>>occured while adding change number 2277387, dn = 
> >>>>>>>>>changenumber=2277387,cn=changelog: Operations error.
> >>>>>>>>>[13/Jul/2015:04:11:50 -0400] retrocl-plugin - retrocl_postob: 
> >>>>>>>>>operation failure [1]
> >>>>>>>>>[13/Jul/2015:07:41:49 -0400] - Operation error fetching Null DN 
> >>>>>>>>>(01de36ac-295411e5-b94db2ab-07afbca6), error -30993.
> >>>>>>>>>[13/Jul/2015:07:41:49 -0400] - dn2entry_ext: Failed to get id for 
> >>>>>>>>>changenumber=2281464,cn=changelog from entryrdn index (-30993)
> >>>>>>>>>[13/Jul/2015:07:41:49 -0400] - Operation error fetching 
> >>>>>>>>>changenumber=2281464,cn=changelog (null), error -30993.
> >>>>>>>>>[13/Jul/2015:07:41:49 -0400] DSRetroclPlugin - replog: an error 
> >>>>>>>>>occured while adding change number 2281464, dn = 
> >>>>>>>>>changenumber=2281464,cn=changelog: Operations error.
> >>>>>>>>>[13/Jul/2015:07:41:49 -0400] retrocl-plugin - retrocl_postob: 
> >>>>>>>>>operation failure [1]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>access logs seem to be showing normal activity. Here's the number of 
> >>>>>>>>>open
> >>>>>>>>>connections:
> >>>>>>>>>
> >>>>>>>>># ls -al /proc/`cat /var/run/dirsrv/slapd-[domain].pid`/fd|grep 
> >>>>>>>>>socket|wc -l
> >>>>>>>>>62
> >>>>>>>>>
> >>>>>>>>>Note: the other two replicas have much higher open connections 
> >>>>>>>>>(>250) and low
> >>>>>>>>>cpu load avgs.
> >>>>>>>>>
> >>>>>>>>>Here's some output of logconv.pl from our most recent access log on 
> >>>>>>>>>the replica
> >>>>>>>>>with high cpu load:
> >>>>>>>>>
> >>>>>>>>>Start of Logs:    13/Jul/2015:04:49:18
> >>>>>>>>>End of Logs:      13/Jul/2015:10:06:11
> >>>>>>>>>
> >>>>>>>>>Processed Log Time:  5 Hours, 16 Minutes, 53 Seconds
> >>>>>>>>>
> >>>>>>>>>Restarts:                     0
> >>>>>>>>>Total Connections:            2343
> >>>>>>>>>  - LDAP Connections:          2120
> >>>>>>>>>  - LDAPI Connections:         223
> >>>>>>>>>  - LDAPS Connections:         0
> >>>>>>>>>  - StartTLS Extended Ops:     45
> >>>>>>>>>  Secure Protocol Versions:
> >>>>>>>>>   - TLS1.2 128-bit AES - 45
> >>>>>>>>>
> >>>>>>>>>Peak Concurrent Connections:  22
> >>>>>>>>>Total Operations:             111865
> >>>>>>>>>Total Results:                111034
> >>>>>>>>>Overall Performance:          99.3%
> >>>>>>>>>
> >>>>>>>>>Searches:                     95585         (5.03/sec)  (301.64/min)
> >>>>>>>>>Modifications:                3369          (0.18/sec)  (10.63/min)
> >>>>>>>>>Adds:                         0             (0.00/sec)  (0.00/min)
> >>>>>>>>>Deletes:                      0             (0.00/sec)  (0.00/min)
> >>>>>>>>>Mod RDNs:                     0             (0.00/sec)  (0.00/min)
> >>>>>>>>>Compares:                     0             (0.00/sec)  (0.00/min)
> >>>>>>>>>Binds:                        7082          (0.37/sec)  (22.35/min)
> >>>>>>>>>
> >>>>>>>>>Proxied Auth Operations:      0
> >>>>>>>>>Persistent Searches:          0
> >>>>>>>>>Internal Operations:          0
> >>>>>>>>>Entry Operations:             0
> >>>>>>>>>Extended Operations:          5317
> >>>>>>>>>Abandoned Requests:           416
> >>>>>>>>>Smart Referrals Received:     0
> >>>>>>>>>
> >>>>>>>>>VLV Operations:               96
> >>>>>>>>>VLV Unindexed Searches:       0
> >>>>>>>>>VLV Unindexed Components:     32
> >>>>>>>>>SORT Operations:              64
> >>>>>>>>>
> >>>>>>>>>Entire Search Base Queries:   0
> >>>>>>>>>Paged Searches:               3882
> >>>>>>>>>Unindexed Searches:           0
> >>>>>>>>>Unindexed Components:         5
> >>>>>>>>>
> >>>>>>>>>FDs Taken:                    2566
> >>>>>>>>>FDs Returned:                 2643
> >>>>>>>>>Highest FD Taken:             249
> >>>>>>>>>
> >>>>>>>>>Broken Pipes:                 0
> >>>>>>>>>Connections Reset By Peer:    0
> >>>>>>>>>Resource Unavailable:         0
> >>>>>>>>>Max BER Size Exceeded:        0
> >>>>>>>>>
> >>>>>>>>>Binds:                        7082
> >>>>>>>>>Unbinds:                      2443
> >>>>>>>>>  - LDAP v2 Binds:             0
> >>>>>>>>>  - LDAP v3 Binds:             6859
> >>>>>>>>>  - AUTOBINDs:                 223
> >>>>>>>>>  - SSL Client Binds:          0
> >>>>>>>>>  - Failed SSL Client Binds:   0
> >>>>>>>>>  - SASL Binds:                6814
> >>>>>>>>>     GSSAPI - 6591
> >>>>>>>>>     EXTERNAL - 223
> >>>>>>>>>  - Directory Manager Binds:   0
> >>>>>>>>>  - Anonymous Binds:           6591
> >>>>>>>>>  - Other Binds:               491
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>strace timing on the ns-slapd process:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>% time     seconds  usecs/call     calls    errors syscall
> >>>>>>>>>------ ----------- ----------- --------- --------- ----------------
> >>>>>>>>>  94.40    0.346659        5977        58           poll
> >>>>>>>>>   4.10    0.015057       15057         1           restart_syscall
> >>>>>>>>>   0.91    0.003353          57        59        59 getpeername
> >>>>>>>>>   0.49    0.001796         150        12           futex
> >>>>>>>>>   0.10    0.000364          73         5           read
> >>>>>>>>>------ ----------- ----------- --------- --------- ----------------
> >>>>>>>>>100.00    0.367229                   135        59 total
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>top output (with threads 'H'):
> >>>>>>>>>
> >>>>>>>>>   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ 
> >>>>>>>>> COMMAND
> >>>>>>>>>  2879 dirsrv    20   0 3819252 1.962g  11680 R 99.9  3.1   8822:10 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2895 dirsrv    20   0 3819252 1.962g  11680 R 34.1  3.1 115:10.62 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2889 dirsrv    20   0 3819252 1.962g  11680 R  2.4  3.1 115:34.42 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2917 dirsrv    20   0 3819252 1.962g  11680 S  2.4  3.1 115:26.87 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2898 dirsrv    20   0 3819252 1.962g  11680 S  2.1  3.1 116:33.12 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2904 dirsrv    20   0 3819252 1.962g  11680 S  2.1  3.1 115:08.56 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2892 dirsrv    20   0 3819252 1.962g  11680 S  1.8  3.1 115:33.04 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2897 dirsrv    20   0 3819252 1.962g  11680 R  1.8  3.1 114:54.28 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2914 dirsrv    20   0 3819252 1.962g  11680 R  1.8  3.1 116:03.35 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2907 dirsrv    20   0 3819252 1.962g  11680 S  1.5  3.1 115:42.25 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2910 dirsrv    20   0 3819252 1.962g  11680 R  1.5  3.1 116:01.99 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2870 dirsrv    20   0 3819252 1.962g  11680 R  1.2  3.1 611:30.22 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2890 dirsrv    20   0 3819252 1.962g  11680 S  1.2  3.1 115:18.25 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2891 dirsrv    20   0 3819252 1.962g  11680 S  1.2  3.1 115:22.24 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2899 dirsrv    20   0 3819252 1.962g  11680 R  1.2  3.1 116:11.85 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2888 dirsrv    20   0 3819252 1.962g  11680 S  0.9  3.1 114:51.19 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2896 dirsrv    20   0 3819252 1.962g  11680 R  0.9  3.1 115:46.84 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2915 dirsrv    20   0 3819252 1.962g  11680 S  0.9  3.1 115:49.34 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2887 dirsrv    20   0 3819252 1.962g  11680 R  0.6  3.1 115:49.85 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2894 dirsrv    20   0 3819252 1.962g  11680 S  0.6  3.1 115:58.02 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2911 dirsrv    20   0 3819252 1.962g  11680 S  0.6  3.1 116:22.84 
> >>>>>>>>> ns-slapd
> >>>>>>>>>  2913 dirsrv    20   0 3819252 1.962g  11680 S  0.6  3.1 114:43.56 
> >>>>>>>>> ns-slapd
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>ns-slapd stays pegged >99%.  Trying to figure out what ns-slapd is 
> >>>>>>>>>doing? Any
> >>>>>>>>>pointers on where else to look?
> >>>>>>>>>
> >>>>>>>>>Thanks in advance.
> >>>>>>>>>
> >>>>>>>>>--Andrew
> >>>>>>>>>
> >>>>>>>>-- 
> >>>>>>>>Manage your subscription for the Freeipa-users mailing list:
> >>>>>>>>https://www.redhat.com/mailman/listinfo/freeipa-users
> >>>>>>>>Go to http://freeipa.org for more info on the project
> >>>>>>>>
> >>>>>>>>
> >>
> 
> 

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to