On 20-04-2020 09:58, Kees Bakker via FreeIPA-users wrote: > On 20-04-2020 09:09, Florence Blanc-Renaud wrote: >> On 4/20/20 8:28 AM, Kees Bakker via FreeIPA-users wrote: >>> Hey, >>> >>> I'm looking for advice how to analyse/debug this. >>> >>> On one of the masters the dirsrv is unresponsive. It runs, but every >>> attempt to connect it hangs. >>> >>> The command "systemctl status" does not show anything alarming >>> >>> ● dirsrv@EXAMPLE-COM.service - 389 Directory Server EXAMPLE-COM. >>> Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; >>> vendor preset: disabled) >>> Active: active (running) since vr 2020-04-17 13:46:25 CEST; 1h 33min ago >>> Process: 3123 ExecStartPre=/usr/sbin/ds_systemd_ask_password_acl >>> /etc/dirsrv/slapd-%i/dse.ldif (code=exited, status=0/SUCCESS) >>> Main PID: 3134 (ns-slapd) >>> Status: "slapd started: Ready to process requests" >>> CGroup: /system.slice/system-dirsrv.slice/dirsrv@EXAMPLE-COM.service >>> └─3134 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-EXAMPLE-COM -i >>> /var/run/dirsrv/slapd-EXAMPLE-COM.pid >>> >>> apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>> apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>> apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>> apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>> apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 2 >>> apr 17 15:18:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>> apr 17 15:18:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>> apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>> apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>> apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 2 >>> >>> However, an ldapsearch command hangs forever >>> >>> [root@rotte ~]# ldapsearch -H ldaps://linge.example.com -D >>> uid=keesbtest,cn=users,cn=accounts,dc=example,dc=com -W -LLL -o >>> ldif-wrap=no -b cn=users,cn=accounts,dc=example,dc=com >>> '(&(objectClass=person)(memberOf=cn=admins,cn=groups,cn=accounts,dc=example,dc=com))' >>> uid >>> Enter LDAP Password: >>> >>> Even if I use the socket (ldapi://%2fvar%2frun%2fslapd-EXAMPLE-COM.socket) >>> the ldapsearch >>> command hangs. >>> >>> "ipactl status" hangs >>> >>> "kinit" hangs >>> >>> >> Hi, >> you can start by having a look at dirsrv error log in >> /var/log/dirsrv-slapd-YOUR_DOMAIN/errors, and the journal. >> >> The FAQ page of 389 also explains a few troubleshooting steps: >> http://www.port389.org/docs/389ds/FAQ/faq.html#Troubleshooting > I did exactly that, look at the "errors" log, but there was no clue, at least > not for me. Strange enough it kept running for a few hours and then it > was hanging again. > > I tried the command "ipctl restart", but that was hanging forever. > However "systemctl restart dirsrv@MY-DOMAIN" was able to restart > it after several minutes. Meanwhile the sn-slapd process was using 100% > CPU. > > Another remark I want to make. Every ldap connection (ldapsearch, whatever) > hangs for ever. No timeout, nothing. > > When it rains, it pours, they say. There is another master with the same > symptom. > I'm getting nervous now. > > Thanks for the Troubleshooting link. I'll have to dive into the deep, I guess.
Could it be a deadlock? [root@linge ~]# grep -a1 '^#0.*lock' slapd-stacktrace.1587374239.txt Thread 23 (Thread 0x7ff8ff265700 (LWP 14474)): #0 0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0 #1 0x00007ff9190cc49c in map_rdlock () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so -- Thread 7 (Thread 0x7ff8f7255700 (LWP 14490)): #0 0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0 #1 0x00007ff9190cc49c in map_rdlock () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so -- Thread 4 (Thread 0x7ff8f5a52700 (LWP 14493)): #0 0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0 #1 0x00007ff9190cc49c in map_rdlock () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so -- Thread 2 (Thread 0x7ff92c355700 (LWP 15679)): #0 0x00007ff92943035e in pthread_rwlock_wrlock () at /lib64/libpthread.so.0 #1 0x00007ff9190b6639 in backend_be_pre_write_cb () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so Without debuginfo the trace of these threads look like this: Thread 23 (Thread 0x7ff8ff265700 (LWP 14474)): #0 0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0 #1 0x00007ff9190cc49c in map_rdlock () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #2 0x00007ff9190b8745 in backend_search_cb () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #3 0x00007ff92bcd9028 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0 #4 0x00007ff92bcd92e3 in plugin_call_plugins () at /usr/lib64/dirsrv/libslapd.so.0 #5 0x00007ff92bccc0d7 in op_shared_search () at /usr/lib64/dirsrv/libslapd.so.0 #6 0x0000562454427bbe in do_search () #7 0x000056245441595a in connection_threadmain () #8 0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so #9 0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0 #10 0x00007ff928ad888d in clone () at /lib64/libc.so.6 Thread 7 (Thread 0x7ff8f7255700 (LWP 14490)): #0 0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0 #1 0x00007ff9190cc49c in map_rdlock () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #2 0x00007ff9190b8745 in backend_search_cb () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #3 0x00007ff92bcd9028 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0 #4 0x00007ff92bcd92e3 in plugin_call_plugins () at /usr/lib64/dirsrv/libslapd.so.0 #5 0x00007ff92bccc0d7 in op_shared_search () at /usr/lib64/dirsrv/libslapd.so.0 #6 0x0000562454427bbe in do_search () #7 0x000056245441595a in connection_threadmain () #8 0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so #9 0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0 #10 0x00007ff928ad888d in clone () at /lib64/libc.so.6 Thread 4 (Thread 0x7ff8f5a52700 (LWP 14493)): #0 0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0 #1 0x00007ff9190cc49c in map_rdlock () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #2 0x00007ff9190b8745 in backend_search_cb () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #3 0x00007ff92bcd9028 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0 #4 0x00007ff92bcd92e3 in plugin_call_plugins () at /usr/lib64/dirsrv/libslapd.so.0 #5 0x00007ff92bccc0d7 in op_shared_search () at /usr/lib64/dirsrv/libslapd.so.0 #6 0x0000562454427bbe in do_search () #7 0x000056245441595a in connection_threadmain () #8 0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so #9 0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0 #10 0x00007ff928ad888d in clone () at /lib64/libc.so.6 Thread 2 (Thread 0x7ff92c355700 (LWP 15679)): #0 0x00007ff92943035e in pthread_rwlock_wrlock () at /lib64/libpthread.so.0 #1 0x00007ff9190b6639 in backend_be_pre_write_cb () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #2 0x00007ff92bcd9028 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0 #3 0x00007ff92bcd92e3 in plugin_call_plugins () at /usr/lib64/dirsrv/libslapd.so.0 #4 0x00007ff91b08ac88 in ldbm_back_delete () at /usr/lib64/dirsrv/plugins/libback-ldbm.so #5 0x00007ff92bc89feb in op_shared_delete () at /usr/lib64/dirsrv/libslapd.so.0 #6 0x00007ff92bc8a1c8 in delete_internal_pb () at /usr/lib64/dirsrv/libslapd.so.0 #7 0x00007ff9196f131e in delete_changerecord () at /usr/lib64/dirsrv/plugins/libretrocl-plugin.so #8 0x00007ff9196f1605 in changelog_trim_thread_fn () at /usr/lib64/dirsrv/plugins/libretrocl-plugin.so #9 0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so #10 0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0 #11 0x00007ff928ad888d in clone () at /lib64/libc.so.6 With the following (stripped) debuginfo you can see that it is hanging while deleting a changerecord. Thread 2 (Thread 0x7ff92c355700 (LWP 15679)): #0 0x00007ff92943035e in pthread_rwlock_wrlock () at /lib64/libpthread.so.0 #1 0x00007ff9190b6639 in backend_be_pre_write_cb () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so #2 0x00007ff92bcd9028 in plugin_call_func (list=0x5624565a3340, operation=operation@entry=453, pb=pb@entry=0x5624629c28 40, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:2028 n = <optimized out> func = 0x7ff9190b65a0 <backend_be_pre_write_cb> rc = <optimized out> return_value = 0 count = 0 #3 0x00007ff92bcd92e3 in plugin_call_list (pb=0x5624629c2840, operation=453, list=<optimized out>) at ldap/servers/slap d/plugin.c:1972 p = 0x5624564ea2c0 locked = <optimized out> plugin_list_number = 3 rc = 0 do_op = <optimized out> #4 0x00007ff92bcd92e3 in plugin_call_plugins (pb=pb@entry=0x5624629c2840, whichfunction=whichfunction@entry=453) at lda p/servers/slapd/plugin.c:442 p = 0x5624564ea2c0 locked = <optimized out> plugin_list_number = 3 rc = 0 do_op = <optimized out> #5 0x00007ff91b08ac88 in ldbm_back_delete (pb=0x5624629c2840) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:373 be = 0x5624564fc000 inst = 0x56245635c680 li = 0x5624562d3a40 e = 0x562462377810 tombstone = 0x0 original_tombstone = 0x0 tmptombstone = 0x0 dn = 0x56245d90fa70 "changenumber=879530,cn=changelog" txn = {back_txn_txn = 0x562459c2cc60} parent_txn = 0x0 retval = 0 msg = <optimized out> errbuf = 0x0 retry_count = <optimized out> disk_full = 0 parent_found = <optimized out> ruv_c_init = 0 parent_modify_c = {old_entry = 0x0, new_entry = 0x0, smods = 0x0, attr_encrypt = 0} ruv_c = {old_entry = 0x0, new_entry = 0x0, smods = 0x0, attr_encrypt = 0} ... -- Kees _______________________________________________ FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org