On Wed, Jul 15, 2015 at 04:58:23PM +0200, Ludwig Krispenz wrote: > > On 07/15/2015 04:10 PM, Andrew E. Bruno wrote: > >On Wed, Jul 15, 2015 at 03:22:51PM +0200, Ludwig Krispenz wrote: > >>On 07/14/2015 08:59 PM, Andrew E. Bruno wrote: > >>>On Tue, Jul 14, 2015 at 04:52:10PM +0200, Ludwig Krispenz wrote: > >>>>hm, the stack traces show csn_str, which correspond to Jul,8th, Jul,4th, > >>>>and > >>>>Jul,7th - so it looks like it is iterating the changelog over and over > >>>>again. > >>>>Th consumer side Is "cn=meTosrv-m14-24.ccr.buffalo.edu" - is this the > >>>>master > >>>>? > >>>> > >>>>can you provide the result of the following search from > >>>>m14-24.ccr.buffalo.edu adn the server with the high cpu: > >>>> > >>>>ldapsearch -o ldif-wrap=no -x -D ... -w -b "cn=config" > >>>>"objectclass=nsds5replica" nsds50ruv > >>>master is srv-m14-24.. here's the results of the ldapsearch: > >>> > >>>[srv-m14-24 ~]$ ldapsearch -o ldif-wrap=no -x -D "cn=directory manager" -W > >>> -b "cn=config" "objectclass=nsds5replica" nsds50ruv > >>> > >>># replica, dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu, mapping tree, config > >>>dn: cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu,cn=mapping > >>>tree,cn=config > >>>nsds50ruv: {replicageneration} 5527f711000000040000 > >>>nsds50ruv: {replica 4 ldap://srv-m14-24.ccr.buffalo.edu:389} > >>>5527f771000000040000 55a55aed001000040000 > >>>nsds50ruv: {replica 5 ldap://srv-m14-26.ccr.buffalo.edu:389} > >>>5537c773000000050000 5591a3d2000700050000 > >>>nsds50ruv: {replica 6 ldap://srv-m14-25-02.ccr.buffalo.edu:389} > >>>55943dda000000060000 55945378000200060000 > >>so this is really strange, the master m14-24 has the latest change from > >>replica 5(m14-26) as: 5591a3d2000700050000 > >>which corresponds to Mon, 29 Jun 2015 20:00:18 GMT > >>so no update from 14-24 since that did arrive, or could not update the ruv. > >>So m14-26 tries to replicate all the changes back from that time, but looks > >>like iit has no success. > >>is there anything in the logs of m14-24 ? can you see successful mods with > >>csn=xxxxxxx00050000 ? > >Here's what I could find from the logs on srv-m14-24: > > > > > >[srv-m14-24 ~]# grep -r 00050000 /var/log/dirsrv/slapd-[domain]/* > >access.20150714-014346:[14/Jul/2015:03:10:05 -0400] conn=748529 op=14732 > >RESULT err=0 tag=103 nentries=0 etime=1 csn=55a4b5f0005000040000 > ok, so no update originating at replica 5 has been replicated (probably > since June,29) did you experience data inconsistency between the servers ? > > > > > >And here's the last few lines the error log on srv-m14-24: > one set of messages refers to the o=ipaca backend and seem to be transient, > replication continues later. > the other set of msg "No original tombstone .." is annoying (and it is fixed > in ticket https://fedorahosted.org/389/ticket/47912) > > the next thing we can do to try to understand what is going on is to enable > replication logging on m14-26, it will then not only consume all cpu, but > write tons of messages to the error log. > But it can be turned on and off: > > ldapmodify ... > dn: cn=config > replace: nsslapd-errorlog-level > nsslapd-errorlog-level: 8192 > > and let it run for a while, then set it back to: 0
I enabled replication logging and it's running now. I noticed the default value for nsslapd-errorlog-level was set to 16384 (not 0). OK to send you the logs off list? Looks like they contain quite a bit of sensitive data. Thanks again for all the help looking into this. Best, --Andrew > > > > >[12/Jul/2015:10:11:14 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2456070,cn=changelog!! > >[12/Jul/2015:10:11:48 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2498441,cn=changelog!! > >[13/Jul/2015:07:41:49 -0400] NSMMReplicationPlugin - > >agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" > >(srv-m14-26:389): Consumer failed to replay change (uniqueid > >cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a3a406000000600000): Operations > >error (1). Will retry later. > >[13/Jul/2015:11:56:50 -0400] NSMMReplicationPlugin - > >agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" > >(srv-m14-26:389): Consumer failed to replay change (uniqueid > >cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a3dfca000000600000): Operations > >error (1). Will retry later. > >[13/Jul/2015:14:26:50 -0400] NSMMReplicationPlugin - > >agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" > >(srv-m14-26:389): Consumer failed to replay change (uniqueid > >cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a402f2000000600000): Operations > >error (1). Will retry later. > >[13/Jul/2015:15:26:49 -0400] NSMMReplicationPlugin - > >agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" > >(srv-m14-26:389): Consumer failed to replay change (uniqueid > >cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a41102000000600000): Operations > >error (1). Will retry later. > >[13/Jul/2015:18:26:51 -0400] NSMMReplicationPlugin - > >agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" > >(srv-m14-26:389): Consumer failed to replay change (uniqueid > >cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a43b32000000600000): Operations > >error (1). Will retry later. > >[13/Jul/2015:18:56:51 -0400] NSMMReplicationPlugin - > >agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" > >(srv-m14-26:389): Consumer failed to replay change (uniqueid > >cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a4423a000000600000): Operations > >error (1). Will retry later. > >[13/Jul/2015:20:41:51 -0400] NSMMReplicationPlugin - > >agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" > >(srv-m14-26:389): Consumer failed to replay change (uniqueid > >cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a45ad6000000600000): Operations > >error (1). Will retry later. > >[13/Jul/2015:22:41:51 -0400] NSMMReplicationPlugin - > >agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" > >(srv-m14-26:389): Consumer failed to replay change (uniqueid > >cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a476f6000000600000): Operations > >error (1). Will retry later. > >[14/Jul/2015:06:56:51 -0400] NSMMReplicationPlugin - > >agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" > >(srv-m14-26:389): Consumer failed to replay change (uniqueid > >cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a4eafa000000600000): Operations > >error (1). Will retry later. > >[14/Jul/2015:09:56:52 -0400] NSMMReplicationPlugin - > >agmt="cn=masterAgreement1-srv-m14-26.ccr.buffalo.edu-pki-tomcat" > >(srv-m14-26:389): Consumer failed to replay change (uniqueid > >cb7acfc1-df9211e4-a351aa45-2e06257b, CSN 55a5152a000000600000): Operations > >error (1). Will retry later. > >[14/Jul/2015:10:11:21 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2552223,cn=changelog!! > >[14/Jul/2015:10:11:21 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2552224,cn=changelog!! > >[14/Jul/2015:10:11:25 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2557315,cn=changelog!! > >[14/Jul/2015:10:11:25 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2557318,cn=changelog!! > >[14/Jul/2015:10:11:28 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2561020,cn=changelog!! > >[14/Jul/2015:10:11:28 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2561043,cn=changelog!! > >[14/Jul/2015:10:11:48 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2586022,cn=changelog!! > >[14/Jul/2015:10:11:59 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2598989,cn=changelog!! > >[14/Jul/2015:10:11:59 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2598990,cn=changelog!! > >[14/Jul/2015:10:12:01 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2600966,cn=changelog!! > >[14/Jul/2015:10:12:03 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2604037,cn=changelog!! > >[14/Jul/2015:10:12:04 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2604054,cn=changelog!! > >[14/Jul/2015:10:12:24 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2629803,cn=changelog!! > >[14/Jul/2015:10:12:24 -0400] ldbm_back_delete - conn=0 op=0 [retry: 1] No > >original_tombstone for changenumber=2629804,cn=changelog!! > > > > > > > > > > > >>># replica, o\3Dipaca, mapping tree, config > >>>dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config > >>>nsds50ruv: {replicageneration} 5527f74b000000600000 > >>>nsds50ruv: {replica 96 ldap://srv-m14-24.ccr.buffalo.edu:389} > >>>5527f754000000600000 55a557f6000000600000 > >>>nsds50ruv: {replica 86 ldap://srv-m14-25-02.ccr.buffalo.edu:389} > >>>55943e6e000000560000 55943e6f000100560000 > >>>nsds50ruv: {replica 91 ldap://srv-m14-26.ccr.buffalo.edu:389} > >>>5537c7ba0000005b0000 5582c7e40004005b0000 > >>> > >>> > >>>server with high cpu load is srv-m14-26. here's the results of the > >>>ldapsearch > >>>from this server: > >>>[srv-m14-26 ~]$ ldapsearch -o ldif-wrap=no -x -D "cn=directory manager" -W > >>> -b "cn=config" "objectclass=nsds5replica" nsds50ruv > >>> > >>># replica, dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu, mapping tree, config > >>>dn: cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu,cn=mapping > >>>tree,cn=config > >>>nsds50ruv: {replicageneration} 5527f711000000040000 > >>>nsds50ruv: {replica 5 ldap://srv-m14-26.ccr.buffalo.edu:389} > >>>5537c773000000050000 55a55b47000300050000 > >>>nsds50ruv: {replica 4 ldap://srv-m14-24.ccr.buffalo.edu:389} > >>>5527f771000000040000 55a53eb0000a00040000 > >>>nsds50ruv: {replica 6 ldap://srv-m14-25-02.ccr.buffalo.edu:389} > >>>55943dda000000060000 55945378000200060000 > >>> > >>># replica, o\3Dipaca, mapping tree, config > >>>dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config > >>>nsds50ruv: {replicageneration} 5527f74b000000600000 > >>>nsds50ruv: {replica 91 ldap://srv-m14-26.ccr.buffalo.edu:389} > >>>5537c7ba0000005b0000 5582c7e40004005b0000 > >>>nsds50ruv: {replica 96 ldap://srv-m14-24.ccr.buffalo.edu:389} > >>>5527f754000000600000 55a557f6000000600000 > >>>nsds50ruv: {replica 86 ldap://srv-m14-25-02.ccr.buffalo.edu:389} > >>>55943e6e000000560000 55943e6f000100560000 > >>> > >>> > >>>srv-m14-25-02 is our 3rd replicate which we recently added back in after it > >>>failed (was added back in 7/1). > >>> > >>>Let me know if you need anything else. Thanks for the help. > >>> > >>>--Andrew > >>> > >>>>On 07/14/2015 02:35 PM, Andrew E. Bruno wrote: > >>>>>On Tue, Jul 14, 2015 at 01:41:57PM +0200, Ludwig Krispenz wrote: > >>>>>>On 07/13/2015 06:36 PM, Andrew E. Bruno wrote: > >>>>>>>On Mon, Jul 13, 2015 at 05:29:13PM +0200, Ludwig Krispenz wrote: > >>>>>>>>On 07/13/2015 05:05 PM, Andrew E. Bruno wrote: > >>>>>>>>>On Mon, Jul 13, 2015 at 04:58:46PM +0200, Ludwig Krispenz wrote: > >>>>>>>>>>can you get a pstack of the slapd process along with a top -H to > >>>>>>>>>>find th > >>>>>>>>>>ethread with high cpu usage > >>>>>>>>>Attached is the full stacktrace of the running ns-slapd proccess. > >>>>>>>>>top -H > >>>>>>>>>shows this thread (2879) with high cpu usage: > >>>>>>>>> > >>>>>>>>>2879 dirsrv 20 0 3819252 1.962g 11680 R 99.9 3.1 8822:10 > >>>>>>>>>ns-slapd > >>>>>>>>this thread is a replication thread sending updates, what is strange > >>>>>>>>is that > >>>>>>>>the current csn_str is quite old (july, 7th), I can't tell which > >>>>>>>>agreeement > >>>>>>>>this thread is handling, but looks like it is heavily reading the > >>>>>>>>changeglog > >>>>>>>>and sending updates. anything changed recently in replication setup ? > >>>>>>>Yes, we had one replica fail on (6/19) which we removed (not this one > >>>>>>>showing high CPU load). Had to perform some manual cleanup of the > >>>>>>>ipa-ca > >>>>>>>RUVs. Then we added the replica back in on 7/1. Since then, replication > >>>>>>>appears to have been running normally between the 3 replicas. We've > >>>>>>>been > >>>>>>>monitoring utilization since 7/1 and only recently seen this spike > >>>>>>>(past > >>>>>>>24 hours or so). > >>>>>>is it still in this state ? or was it a spike. > >>>>>Yes same state. > >>>>> > >>>>>>if it still is high cpu consuming, could you > >>>>>>- get a few pstack like the one before with some time in between, I > >>>>>>would > >>>>>>like to see if it is progressing with the csns or looping on the same > >>>>>>one > >>>>>Attached are a few stacktraces. The thread pegging the cpu is: > >>>>> > >>>>>PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > >>>>>COMMAND > >>>>>2879 dirsrv 20 0 3819252 1.978g 11684 R 99.9 3.2 10148:26 > >>>>>ns-slapd > >>>>> > >>>>>>- check the consumer side. is there anything in the error log ? does the > >>>>>>access log show replication activity from this server > >>>>>Here's some errors showing up on the first master server rep1 (rep2 is > >>>>>the > >>>>>server with pegged cpu): > >>>>> > >>>>>[13/Jul/2015:20:41:51 -0400] NSMMReplicationPlugin - > >>>>>agmt="cn=masterAgreement1-rep2-pki-tomcat" (rep2:389): Consumer failed > >>>>>to replay change (uniqueid cb7acfc1-df9211e4-a351aa45-2e06257b, CSN > >>>>>55a45ad6000000600000): Operations error (1). Will retry later. > >>>>>[13/Jul/2015:22:41:51 -0400] NSMMReplicationPlugin - > >>>>>agmt="cn=masterAgreement1-rep2-pki-tomcat" (rep2:389): Consumer failed > >>>>>to replay change (uniqueid cb7acfc1-df9211e4-a351aa45-2e06257b, CSN > >>>>>55a476f6000000600000): Operations error (1). Will retry later. > >>>>>[14/Jul/2015:06:56:51 -0400] NSMMReplicationPlugin - > >>>>>agmt="cn=masterAgreement1-rep2-tomcat" (rep2:389): Consumer failed to > >>>>>replay change (uniqueid cb7acfc1-df9211e4-a351aa45-2e06257b, CSN > >>>>>55a4eafa000000600000): Operations error (1). Will retry later. > >>>>> > >>>>> > >>>>>Here's some snips from the access log of the rep2: > >>>>> > >>>>> > >>>>>[14/Jul/2015:08:22:31 -0400] conn=777787 op=9794 EXT > >>>>>oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" > >>>>>[14/Jul/2015:08:22:31 -0400] conn=777787 op=9794 RESULT err=0 tag=120 > >>>>>nentries=0 etime=0 > >>>>>[14/Jul/2015:08:22:31 -0400] conn=777787 op=9795 EXT > >>>>>oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" > >>>>>[14/Jul/2015:08:22:31 -0400] conn=777787 op=9795 RESULT err=0 tag=120 > >>>>>nentries=0 etime=0 > >>>>>[14/Jul/2015:08:22:33 -0400] conn=777787 op=9796 EXT > >>>>>oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" > >>>>>.. > >>>>>[14/Jul/2015:08:23:38 -0400] conn=782341 op=129 RESULT err=0 tag=103 > >>>>>nentries=0 etime=0 csn=55a4ff6c000000050000 > >>>>>.. > >>>>>[14/Jul/2015:08:24:02 -0400] conn=781901 op=1745 RESULT err=0 tag=101 > >>>>>nentries=1 etime=0 > >>>>>[14/Jul/2015:08:24:03 -0400] conn=777787 op=9810 EXT > >>>>>oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" > >>>>>[14/Jul/2015:08:24:03 -0400] conn=777787 op=9810 RESULT err=0 tag=120 > >>>>>nentries=0 etime=0 > >>>>>[14/Jul/2015:08:24:03 -0400] conn=777787 op=9811 EXT > >>>>>oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" > >>>>>[14/Jul/2015:08:24:03 -0400] conn=777787 op=9811 RESULT err=0 tag=120 > >>>>>nentries=0 etime=0 > >>>>>[14/Jul/2015:08:24:05 -0400] conn=777787 op=9812 EXT > >>>>>oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" > >>>>>[14/Jul/2015:08:24:05 -0400] conn=777787 op=9812 RESULT err=0 tag=120 > >>>>>nentries=0 etime=0 > >>>>>[14/Jul/2015:08:24:08 -0400] conn=777787 op=9813 EXT > >>>>>oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" > >>>>>[14/Jul/2015:08:24:08 -0400] conn=777787 op=9813 RESULT err=0 tag=120 > >>>>>nentries=0 etime=0 > >>>>>[14/Jul/2015:08:24:10 -0400] conn=777787 op=9814 EXT > >>>>>oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" > >>>>>[14/Jul/2015:08:24:10 -0400] conn=777787 op=9814 RESULT err=0 tag=120 > >>>>>nentries=0 etime=0 > >>>>>[14/Jul/2015:08:24:10 -0400] conn=777787 op=9815 EXT > >>>>>oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop" > >>>>>[14/Jul/2015:08:24:10 -0400] conn=777787 op=9815 RESULT err=0 tag=120 > >>>>>nentries=0 etime=0 > >>>>> > >>>>>and here's some from the error log: > >>>>> > >>>>>[13/Jul/2015:22:41:49 -0400] DSRetroclPlugin - replog: an error occured > >>>>>while adding change number 2294859, dn = > >>>>>changenumber=2294859,cn=changelog: Operations error. > >>>>>[13/Jul/2015:22:41:49 -0400] retrocl-plugin - retrocl_postob: operation > >>>>>failure [1] > >>>>>[13/Jul/2015:23:56:50 -0400] - dn2entry_ext: Failed to get id for > >>>>>changenumber=2296384,cn=changelog from entryrdn index (-30993) > >>>>>[13/Jul/2015:23:56:50 -0400] - Operation error fetching > >>>>>changenumber=2296384,cn=changelog (null), error -30993. > >>>>>[13/Jul/2015:23:56:50 -0400] DSRetroclPlugin - replog: an error occured > >>>>>while adding change number 2296384, dn = > >>>>>changenumber=2296384,cn=changelog: Operations error. > >>>>>[13/Jul/2015:23:56:50 -0400] retrocl-plugin - retrocl_postob: operation > >>>>>failure [1] > >>>>>[14/Jul/2015:06:56:50 -0400] - dn2entry_ext: Failed to get id for > >>>>>changenumber=2304418,cn=changelog from entryrdn index (-30993) > >>>>>[14/Jul/2015:06:56:50 -0400] - Operation error fetching > >>>>>changenumber=2304418,cn=changelog (null), error -30993. > >>>>>[14/Jul/2015:06:56:50 -0400] DSRetroclPlugin - replog: an error occured > >>>>>while adding change number 2304418, dn = > >>>>>changenumber=2304418,cn=changelog: Operations error. > >>>>>[14/Jul/2015:06:56:50 -0400] retrocl-plugin - retrocl_postob: operation > >>>>>failure [1] > >>>>> > >>>>> > >>>>>>- eventually enable replication logging: nsslapd-errorlog-level: 8192 > >>>>>>>On a side note, we get hit with this bug often: > >>>>>>> > >>>>>>>https://www.redhat.com/archives/freeipa-users/2015-July/msg00018.html > >>>>>>> > >>>>>>>(rouge sssd_be processing hammering a replica). > >>>>>>> > >>>>>>>This causes high ns-slapd utilization on the replica and restarting > >>>>>>>sssd > >>>>>>>on the client host immediately fixes the issue. However, in this > >>>>>>>case, we're not seeing this behavior. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>> > >>>>>>>>>>On 07/13/2015 04:46 PM, Andrew E. Bruno wrote: > >>>>>>>>>>>We have 3 freeipa-replicas. Centos 7.1.1503, ipa-server 4.1.0-18, > >>>>>>>>>>>and > >>>>>>>>>>>389-ds 1.3.3.1-16. > >>>>>>>>>>> > >>>>>>>>>>>Recently, the ns-slapd process on one of our replicas started > >>>>>>>>>>>showing higher > >>>>>>>>>>>than normal CPU usage. ns-slapd is pegged at high CPU usage more > >>>>>>>>>>>or less > >>>>>>>>>>>constantly. > >>>>>>>>>>> > >>>>>>>>>>>Seems very similar to this thread: > >>>>>>>>>>>https://www.redhat.com/archives/freeipa-users/2015-February/msg00281.html > >>>>>>>>>>> > >>>>>>>>>>>There are a few errors showing in > >>>>>>>>>>>/var/log/dirsrv/slapd-[domain]/errors (not > >>>>>>>>>>>sure if these are related): > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>[13/Jul/2015:02:56:49 -0400] retrocl-plugin - retrocl_postob: > >>>>>>>>>>>operation failure [1] > >>>>>>>>>>>[13/Jul/2015:04:11:50 -0400] - dn2entry_ext: Failed to get id for > >>>>>>>>>>>changenumber=2277387,cn=changelog from entryrdn index (-30993) > >>>>>>>>>>>[13/Jul/2015:04:11:50 -0400] - Operation error fetching > >>>>>>>>>>>changenumber=2277387,cn=changelog (null), error -30993. > >>>>>>>>>>>[13/Jul/2015:04:11:50 -0400] DSRetroclPlugin - replog: an error > >>>>>>>>>>>occured while adding change number 2277387, dn = > >>>>>>>>>>>changenumber=2277387,cn=changelog: Operations error. > >>>>>>>>>>>[13/Jul/2015:04:11:50 -0400] retrocl-plugin - retrocl_postob: > >>>>>>>>>>>operation failure [1] > >>>>>>>>>>>[13/Jul/2015:07:41:49 -0400] - Operation error fetching Null DN > >>>>>>>>>>>(01de36ac-295411e5-b94db2ab-07afbca6), error -30993. > >>>>>>>>>>>[13/Jul/2015:07:41:49 -0400] - dn2entry_ext: Failed to get id for > >>>>>>>>>>>changenumber=2281464,cn=changelog from entryrdn index (-30993) > >>>>>>>>>>>[13/Jul/2015:07:41:49 -0400] - Operation error fetching > >>>>>>>>>>>changenumber=2281464,cn=changelog (null), error -30993. > >>>>>>>>>>>[13/Jul/2015:07:41:49 -0400] DSRetroclPlugin - replog: an error > >>>>>>>>>>>occured while adding change number 2281464, dn = > >>>>>>>>>>>changenumber=2281464,cn=changelog: Operations error. > >>>>>>>>>>>[13/Jul/2015:07:41:49 -0400] retrocl-plugin - retrocl_postob: > >>>>>>>>>>>operation failure [1] > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>access logs seem to be showing normal activity. Here's the number > >>>>>>>>>>>of open > >>>>>>>>>>>connections: > >>>>>>>>>>> > >>>>>>>>>>># ls -al /proc/`cat /var/run/dirsrv/slapd-[domain].pid`/fd|grep > >>>>>>>>>>>socket|wc -l > >>>>>>>>>>>62 > >>>>>>>>>>> > >>>>>>>>>>>Note: the other two replicas have much higher open connections > >>>>>>>>>>>(>250) and low > >>>>>>>>>>>cpu load avgs. > >>>>>>>>>>> > >>>>>>>>>>>Here's some output of logconv.pl from our most recent access log > >>>>>>>>>>>on the replica > >>>>>>>>>>>with high cpu load: > >>>>>>>>>>> > >>>>>>>>>>>Start of Logs: 13/Jul/2015:04:49:18 > >>>>>>>>>>>End of Logs: 13/Jul/2015:10:06:11 > >>>>>>>>>>> > >>>>>>>>>>>Processed Log Time: 5 Hours, 16 Minutes, 53 Seconds > >>>>>>>>>>> > >>>>>>>>>>>Restarts: 0 > >>>>>>>>>>>Total Connections: 2343 > >>>>>>>>>>> - LDAP Connections: 2120 > >>>>>>>>>>> - LDAPI Connections: 223 > >>>>>>>>>>> - LDAPS Connections: 0 > >>>>>>>>>>> - StartTLS Extended Ops: 45 > >>>>>>>>>>> Secure Protocol Versions: > >>>>>>>>>>> - TLS1.2 128-bit AES - 45 > >>>>>>>>>>> > >>>>>>>>>>>Peak Concurrent Connections: 22 > >>>>>>>>>>>Total Operations: 111865 > >>>>>>>>>>>Total Results: 111034 > >>>>>>>>>>>Overall Performance: 99.3% > >>>>>>>>>>> > >>>>>>>>>>>Searches: 95585 (5.03/sec) > >>>>>>>>>>>(301.64/min) > >>>>>>>>>>>Modifications: 3369 (0.18/sec) (10.63/min) > >>>>>>>>>>>Adds: 0 (0.00/sec) (0.00/min) > >>>>>>>>>>>Deletes: 0 (0.00/sec) (0.00/min) > >>>>>>>>>>>Mod RDNs: 0 (0.00/sec) (0.00/min) > >>>>>>>>>>>Compares: 0 (0.00/sec) (0.00/min) > >>>>>>>>>>>Binds: 7082 (0.37/sec) (22.35/min) > >>>>>>>>>>> > >>>>>>>>>>>Proxied Auth Operations: 0 > >>>>>>>>>>>Persistent Searches: 0 > >>>>>>>>>>>Internal Operations: 0 > >>>>>>>>>>>Entry Operations: 0 > >>>>>>>>>>>Extended Operations: 5317 > >>>>>>>>>>>Abandoned Requests: 416 > >>>>>>>>>>>Smart Referrals Received: 0 > >>>>>>>>>>> > >>>>>>>>>>>VLV Operations: 96 > >>>>>>>>>>>VLV Unindexed Searches: 0 > >>>>>>>>>>>VLV Unindexed Components: 32 > >>>>>>>>>>>SORT Operations: 64 > >>>>>>>>>>> > >>>>>>>>>>>Entire Search Base Queries: 0 > >>>>>>>>>>>Paged Searches: 3882 > >>>>>>>>>>>Unindexed Searches: 0 > >>>>>>>>>>>Unindexed Components: 5 > >>>>>>>>>>> > >>>>>>>>>>>FDs Taken: 2566 > >>>>>>>>>>>FDs Returned: 2643 > >>>>>>>>>>>Highest FD Taken: 249 > >>>>>>>>>>> > >>>>>>>>>>>Broken Pipes: 0 > >>>>>>>>>>>Connections Reset By Peer: 0 > >>>>>>>>>>>Resource Unavailable: 0 > >>>>>>>>>>>Max BER Size Exceeded: 0 > >>>>>>>>>>> > >>>>>>>>>>>Binds: 7082 > >>>>>>>>>>>Unbinds: 2443 > >>>>>>>>>>> - LDAP v2 Binds: 0 > >>>>>>>>>>> - LDAP v3 Binds: 6859 > >>>>>>>>>>> - AUTOBINDs: 223 > >>>>>>>>>>> - SSL Client Binds: 0 > >>>>>>>>>>> - Failed SSL Client Binds: 0 > >>>>>>>>>>> - SASL Binds: 6814 > >>>>>>>>>>> GSSAPI - 6591 > >>>>>>>>>>> EXTERNAL - 223 > >>>>>>>>>>> - Directory Manager Binds: 0 > >>>>>>>>>>> - Anonymous Binds: 6591 > >>>>>>>>>>> - Other Binds: 491 > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>strace timing on the ns-slapd process: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>% time seconds usecs/call calls errors syscall > >>>>>>>>>>>------ ----------- ----------- --------- --------- ---------------- > >>>>>>>>>>> 94.40 0.346659 5977 58 poll > >>>>>>>>>>> 4.10 0.015057 15057 1 restart_syscall > >>>>>>>>>>> 0.91 0.003353 57 59 59 getpeername > >>>>>>>>>>> 0.49 0.001796 150 12 futex > >>>>>>>>>>> 0.10 0.000364 73 5 read > >>>>>>>>>>>------ ----------- ----------- --------- --------- ---------------- > >>>>>>>>>>>100.00 0.367229 135 59 total > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>top output (with threads 'H'): > >>>>>>>>>>> > >>>>>>>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM > >>>>>>>>>>> TIME+ COMMAND > >>>>>>>>>>> 2879 dirsrv 20 0 3819252 1.962g 11680 R 99.9 3.1 > >>>>>>>>>>> 8822:10 ns-slapd > >>>>>>>>>>> 2895 dirsrv 20 0 3819252 1.962g 11680 R 34.1 3.1 > >>>>>>>>>>> 115:10.62 ns-slapd > >>>>>>>>>>> 2889 dirsrv 20 0 3819252 1.962g 11680 R 2.4 3.1 > >>>>>>>>>>> 115:34.42 ns-slapd > >>>>>>>>>>> 2917 dirsrv 20 0 3819252 1.962g 11680 S 2.4 3.1 > >>>>>>>>>>> 115:26.87 ns-slapd > >>>>>>>>>>> 2898 dirsrv 20 0 3819252 1.962g 11680 S 2.1 3.1 > >>>>>>>>>>> 116:33.12 ns-slapd > >>>>>>>>>>> 2904 dirsrv 20 0 3819252 1.962g 11680 S 2.1 3.1 > >>>>>>>>>>> 115:08.56 ns-slapd > >>>>>>>>>>> 2892 dirsrv 20 0 3819252 1.962g 11680 S 1.8 3.1 > >>>>>>>>>>> 115:33.04 ns-slapd > >>>>>>>>>>> 2897 dirsrv 20 0 3819252 1.962g 11680 R 1.8 3.1 > >>>>>>>>>>> 114:54.28 ns-slapd > >>>>>>>>>>> 2914 dirsrv 20 0 3819252 1.962g 11680 R 1.8 3.1 > >>>>>>>>>>> 116:03.35 ns-slapd > >>>>>>>>>>> 2907 dirsrv 20 0 3819252 1.962g 11680 S 1.5 3.1 > >>>>>>>>>>> 115:42.25 ns-slapd > >>>>>>>>>>> 2910 dirsrv 20 0 3819252 1.962g 11680 R 1.5 3.1 > >>>>>>>>>>> 116:01.99 ns-slapd > >>>>>>>>>>> 2870 dirsrv 20 0 3819252 1.962g 11680 R 1.2 3.1 > >>>>>>>>>>> 611:30.22 ns-slapd > >>>>>>>>>>> 2890 dirsrv 20 0 3819252 1.962g 11680 S 1.2 3.1 > >>>>>>>>>>> 115:18.25 ns-slapd > >>>>>>>>>>> 2891 dirsrv 20 0 3819252 1.962g 11680 S 1.2 3.1 > >>>>>>>>>>> 115:22.24 ns-slapd > >>>>>>>>>>> 2899 dirsrv 20 0 3819252 1.962g 11680 R 1.2 3.1 > >>>>>>>>>>> 116:11.85 ns-slapd > >>>>>>>>>>> 2888 dirsrv 20 0 3819252 1.962g 11680 S 0.9 3.1 > >>>>>>>>>>> 114:51.19 ns-slapd > >>>>>>>>>>> 2896 dirsrv 20 0 3819252 1.962g 11680 R 0.9 3.1 > >>>>>>>>>>> 115:46.84 ns-slapd > >>>>>>>>>>> 2915 dirsrv 20 0 3819252 1.962g 11680 S 0.9 3.1 > >>>>>>>>>>> 115:49.34 ns-slapd > >>>>>>>>>>> 2887 dirsrv 20 0 3819252 1.962g 11680 R 0.6 3.1 > >>>>>>>>>>> 115:49.85 ns-slapd > >>>>>>>>>>> 2894 dirsrv 20 0 3819252 1.962g 11680 S 0.6 3.1 > >>>>>>>>>>> 115:58.02 ns-slapd > >>>>>>>>>>> 2911 dirsrv 20 0 3819252 1.962g 11680 S 0.6 3.1 > >>>>>>>>>>> 116:22.84 ns-slapd > >>>>>>>>>>> 2913 dirsrv 20 0 3819252 1.962g 11680 S 0.6 3.1 > >>>>>>>>>>> 114:43.56 ns-slapd > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>ns-slapd stays pegged >99%. Trying to figure out what ns-slapd is > >>>>>>>>>>>doing? Any > >>>>>>>>>>>pointers on where else to look? > >>>>>>>>>>> > >>>>>>>>>>>Thanks in advance. > >>>>>>>>>>> > >>>>>>>>>>>--Andrew > >>>>>>>>>>> > >>>>>>>>>>-- > >>>>>>>>>>Manage your subscription for the Freeipa-users mailing list: > >>>>>>>>>>https://www.redhat.com/mailman/listinfo/freeipa-users > >>>>>>>>>>Go to http://freeipa.org for more info on the project > >>>>>>>>>> > >>>>>>>>>> > >> > > -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project