On Mon, Jul 13, 2015 at 05:29:13PM +0200, Ludwig Krispenz wrote:
> 
> On 07/13/2015 05:05 PM, Andrew E. Bruno wrote:
> >On Mon, Jul 13, 2015 at 04:58:46PM +0200, Ludwig Krispenz wrote:
> >>can you get a pstack of the slapd process along with a top -H to find th
> >>ethread with high cpu usage
> >Attached is the full stacktrace of the running ns-slapd proccess. top -H
> >shows this thread (2879) with high cpu usage:
> >
> >2879 dirsrv    20   0 3819252 1.962g  11680 R 99.9  3.1   8822:10 ns-slapd
> this thread is a replication thread sending updates, what is strange is that
> the current csn_str is quite old (july, 7th), I can't tell which agreeement
> this thread is handling, but looks like it is heavily reading the changeglog
> and sending updates. anything changed recently in replication setup ?


Yes, we had one replica fail on (6/19) which we removed (not this one
showing high CPU load). Had to perform some manual cleanup of the ipa-ca
RUVs. Then we added the replica back in on 7/1. Since then, replication
appears to have been running normally between the 3 replicas. We've been
monitoring utilization since 7/1 and only recently seen this spike (past
24 hours or so). 

On a side note, we get hit with this bug often:

https://www.redhat.com/archives/freeipa-users/2015-July/msg00018.html

(rouge sssd_be processing hammering a replica). 

This causes high ns-slapd utilization on the replica and restarting sssd
on the client host immediately fixes the issue. However, in this
case, we're not seeing this behavior.




> >
> >
> >
> >
> >
> >>On 07/13/2015 04:46 PM, Andrew E. Bruno wrote:
> >>>We have 3 freeipa-replicas. Centos 7.1.1503, ipa-server 4.1.0-18, and
> >>>389-ds 1.3.3.1-16.
> >>>
> >>>Recently, the ns-slapd process on one of our replicas started showing 
> >>>higher
> >>>than normal CPU usage. ns-slapd is pegged at high CPU usage more or less
> >>>constantly.
> >>>
> >>>Seems very similar to this thread:
> >>>https://www.redhat.com/archives/freeipa-users/2015-February/msg00281.html
> >>>
> >>>There are a few errors showing in /var/log/dirsrv/slapd-[domain]/errors 
> >>>(not
> >>>sure if these are related):
> >>>
> >>>
> >>>[13/Jul/2015:02:56:49 -0400] retrocl-plugin - retrocl_postob: operation 
> >>>failure [1]
> >>>[13/Jul/2015:04:11:50 -0400] - dn2entry_ext: Failed to get id for 
> >>>changenumber=2277387,cn=changelog from entryrdn index (-30993)
> >>>[13/Jul/2015:04:11:50 -0400] - Operation error fetching 
> >>>changenumber=2277387,cn=changelog (null), error -30993.
> >>>[13/Jul/2015:04:11:50 -0400] DSRetroclPlugin - replog: an error occured 
> >>>while adding change number 2277387, dn = 
> >>>changenumber=2277387,cn=changelog: Operations error.
> >>>[13/Jul/2015:04:11:50 -0400] retrocl-plugin - retrocl_postob: operation 
> >>>failure [1]
> >>>[13/Jul/2015:07:41:49 -0400] - Operation error fetching Null DN 
> >>>(01de36ac-295411e5-b94db2ab-07afbca6), error -30993.
> >>>[13/Jul/2015:07:41:49 -0400] - dn2entry_ext: Failed to get id for 
> >>>changenumber=2281464,cn=changelog from entryrdn index (-30993)
> >>>[13/Jul/2015:07:41:49 -0400] - Operation error fetching 
> >>>changenumber=2281464,cn=changelog (null), error -30993.
> >>>[13/Jul/2015:07:41:49 -0400] DSRetroclPlugin - replog: an error occured 
> >>>while adding change number 2281464, dn = 
> >>>changenumber=2281464,cn=changelog: Operations error.
> >>>[13/Jul/2015:07:41:49 -0400] retrocl-plugin - retrocl_postob: operation 
> >>>failure [1]
> >>>
> >>>
> >>>access logs seem to be showing normal activity. Here's the number of open
> >>>connections:
> >>>
> >>># ls -al /proc/`cat /var/run/dirsrv/slapd-[domain].pid`/fd|grep socket|wc 
> >>>-l
> >>>62
> >>>
> >>>Note: the other two replicas have much higher open connections (>250) and 
> >>>low
> >>>cpu load avgs.
> >>>
> >>>Here's some output of logconv.pl from our most recent access log on the 
> >>>replica
> >>>with high cpu load:
> >>>
> >>>Start of Logs:    13/Jul/2015:04:49:18
> >>>End of Logs:      13/Jul/2015:10:06:11
> >>>
> >>>Processed Log Time:  5 Hours, 16 Minutes, 53 Seconds
> >>>
> >>>Restarts:                     0
> >>>Total Connections:            2343
> >>>  - LDAP Connections:          2120
> >>>  - LDAPI Connections:         223
> >>>  - LDAPS Connections:         0
> >>>  - StartTLS Extended Ops:     45
> >>>  Secure Protocol Versions:
> >>>   - TLS1.2 128-bit AES - 45
> >>>
> >>>Peak Concurrent Connections:  22
> >>>Total Operations:             111865
> >>>Total Results:                111034
> >>>Overall Performance:          99.3%
> >>>
> >>>Searches:                     95585         (5.03/sec)  (301.64/min)
> >>>Modifications:                3369          (0.18/sec)  (10.63/min)
> >>>Adds:                         0             (0.00/sec)  (0.00/min)
> >>>Deletes:                      0             (0.00/sec)  (0.00/min)
> >>>Mod RDNs:                     0             (0.00/sec)  (0.00/min)
> >>>Compares:                     0             (0.00/sec)  (0.00/min)
> >>>Binds:                        7082          (0.37/sec)  (22.35/min)
> >>>
> >>>Proxied Auth Operations:      0
> >>>Persistent Searches:          0
> >>>Internal Operations:          0
> >>>Entry Operations:             0
> >>>Extended Operations:          5317
> >>>Abandoned Requests:           416
> >>>Smart Referrals Received:     0
> >>>
> >>>VLV Operations:               96
> >>>VLV Unindexed Searches:       0
> >>>VLV Unindexed Components:     32
> >>>SORT Operations:              64
> >>>
> >>>Entire Search Base Queries:   0
> >>>Paged Searches:               3882
> >>>Unindexed Searches:           0
> >>>Unindexed Components:         5
> >>>
> >>>FDs Taken:                    2566
> >>>FDs Returned:                 2643
> >>>Highest FD Taken:             249
> >>>
> >>>Broken Pipes:                 0
> >>>Connections Reset By Peer:    0
> >>>Resource Unavailable:         0
> >>>Max BER Size Exceeded:        0
> >>>
> >>>Binds:                        7082
> >>>Unbinds:                      2443
> >>>  - LDAP v2 Binds:             0
> >>>  - LDAP v3 Binds:             6859
> >>>  - AUTOBINDs:                 223
> >>>  - SSL Client Binds:          0
> >>>  - Failed SSL Client Binds:   0
> >>>  - SASL Binds:                6814
> >>>     GSSAPI - 6591
> >>>     EXTERNAL - 223
> >>>  - Directory Manager Binds:   0
> >>>  - Anonymous Binds:           6591
> >>>  - Other Binds:               491
> >>>
> >>>
> >>>
> >>>
> >>>strace timing on the ns-slapd process:
> >>>
> >>>
> >>>% time     seconds  usecs/call     calls    errors syscall
> >>>------ ----------- ----------- --------- --------- ----------------
> >>>  94.40    0.346659        5977        58           poll
> >>>   4.10    0.015057       15057         1           restart_syscall
> >>>   0.91    0.003353          57        59        59 getpeername
> >>>   0.49    0.001796         150        12           futex
> >>>   0.10    0.000364          73         5           read
> >>>------ ----------- ----------- --------- --------- ----------------
> >>>100.00    0.367229                   135        59 total
> >>>
> >>>
> >>>top output (with threads 'H'):
> >>>
> >>>   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
> >>>  2879 dirsrv    20   0 3819252 1.962g  11680 R 99.9  3.1   8822:10 
> >>> ns-slapd
> >>>  2895 dirsrv    20   0 3819252 1.962g  11680 R 34.1  3.1 115:10.62 
> >>> ns-slapd
> >>>  2889 dirsrv    20   0 3819252 1.962g  11680 R  2.4  3.1 115:34.42 
> >>> ns-slapd
> >>>  2917 dirsrv    20   0 3819252 1.962g  11680 S  2.4  3.1 115:26.87 
> >>> ns-slapd
> >>>  2898 dirsrv    20   0 3819252 1.962g  11680 S  2.1  3.1 116:33.12 
> >>> ns-slapd
> >>>  2904 dirsrv    20   0 3819252 1.962g  11680 S  2.1  3.1 115:08.56 
> >>> ns-slapd
> >>>  2892 dirsrv    20   0 3819252 1.962g  11680 S  1.8  3.1 115:33.04 
> >>> ns-slapd
> >>>  2897 dirsrv    20   0 3819252 1.962g  11680 R  1.8  3.1 114:54.28 
> >>> ns-slapd
> >>>  2914 dirsrv    20   0 3819252 1.962g  11680 R  1.8  3.1 116:03.35 
> >>> ns-slapd
> >>>  2907 dirsrv    20   0 3819252 1.962g  11680 S  1.5  3.1 115:42.25 
> >>> ns-slapd
> >>>  2910 dirsrv    20   0 3819252 1.962g  11680 R  1.5  3.1 116:01.99 
> >>> ns-slapd
> >>>  2870 dirsrv    20   0 3819252 1.962g  11680 R  1.2  3.1 611:30.22 
> >>> ns-slapd
> >>>  2890 dirsrv    20   0 3819252 1.962g  11680 S  1.2  3.1 115:18.25 
> >>> ns-slapd
> >>>  2891 dirsrv    20   0 3819252 1.962g  11680 S  1.2  3.1 115:22.24 
> >>> ns-slapd
> >>>  2899 dirsrv    20   0 3819252 1.962g  11680 R  1.2  3.1 116:11.85 
> >>> ns-slapd
> >>>  2888 dirsrv    20   0 3819252 1.962g  11680 S  0.9  3.1 114:51.19 
> >>> ns-slapd
> >>>  2896 dirsrv    20   0 3819252 1.962g  11680 R  0.9  3.1 115:46.84 
> >>> ns-slapd
> >>>  2915 dirsrv    20   0 3819252 1.962g  11680 S  0.9  3.1 115:49.34 
> >>> ns-slapd
> >>>  2887 dirsrv    20   0 3819252 1.962g  11680 R  0.6  3.1 115:49.85 
> >>> ns-slapd
> >>>  2894 dirsrv    20   0 3819252 1.962g  11680 S  0.6  3.1 115:58.02 
> >>> ns-slapd
> >>>  2911 dirsrv    20   0 3819252 1.962g  11680 S  0.6  3.1 116:22.84 
> >>> ns-slapd
> >>>  2913 dirsrv    20   0 3819252 1.962g  11680 S  0.6  3.1 114:43.56 
> >>> ns-slapd
> >>>
> >>>
> >>>ns-slapd stays pegged >99%.  Trying to figure out what ns-slapd is doing? 
> >>>Any
> >>>pointers on where else to look?
> >>>
> >>>Thanks in advance.
> >>>
> >>>--Andrew
> >>>
> >>-- 
> >>Manage your subscription for the Freeipa-users mailing list:
> >>https://www.redhat.com/mailman/listinfo/freeipa-users
> >>Go to http://freeipa.org for more info on the project
> >>
> >>
> 
> 

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to