On 4/20/20 3:02 PM, Kees Bakker wrote:
On 20-04-2020 14:51, Rob Crittenden wrote:
*** EXTERNAL E-MAIL ***


Kees Bakker via FreeIPA-users wrote:
On 20-04-2020 09:58, Kees Bakker via FreeIPA-users wrote:
On 20-04-2020 09:09, Florence Blanc-Renaud wrote:
On 4/20/20 8:28 AM, Kees Bakker via FreeIPA-users wrote:
Hey,

I'm looking for advice how to analyse/debug this.

On one of the masters the dirsrv is unresponsive. It runs, but every
attempt to connect it hangs.

The command "systemctl status" does not show anything alarming

● dirsrv@EXAMPLE-COM.service - 389 Directory Server EXAMPLE-COM.
     Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; vendor 
preset: disabled)
     Active: active (running) since vr 2020-04-17 13:46:25 CEST; 1h 33min ago
    Process: 3123 ExecStartPre=/usr/sbin/ds_systemd_ask_password_acl 
/etc/dirsrv/slapd-%i/dse.ldif (code=exited, status=0/SUCCESS)
   Main PID: 3134 (ns-slapd)
     Status: "slapd started: Ready to process requests"
     CGroup: /system.slice/system-dirsrv.slice/dirsrv@EXAMPLE-COM.service
             └─3134 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-EXAMPLE-COM -i 
/var/run/dirsrv/slapd-EXAMPLE-COM.pid

apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 2
apr 17 15:18:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 2

However, an ldapsearch command hangs forever

[root@rotte ~]# ldapsearch -H ldaps://linge.example.com -D 
uid=keesbtest,cn=users,cn=accounts,dc=example,dc=com -W -LLL -o ldif-wrap=no -b 
cn=users,cn=accounts,dc=example,dc=com 
'(&(objectClass=person)(memberOf=cn=admins,cn=groups,cn=accounts,dc=example,dc=com))'
 uid
Enter LDAP Password:

Even if I use the socket (ldapi://%2fvar%2frun%2fslapd-EXAMPLE-COM.socket) the 
ldapsearch
command hangs.

"ipactl status" hangs

"kinit" hangs


Hi,
you can start by having a look at dirsrv error log in
/var/log/dirsrv-slapd-YOUR_DOMAIN/errors, and the journal.

The FAQ page of 389 also explains a few troubleshooting steps:
http://www.port389.org/docs/389ds/FAQ/faq.html#Troubleshooting
I did exactly that, look at the "errors" log, but there was no clue, at least
not for me. Strange enough it kept running for a few hours and then it
was hanging again.

I tried the command "ipctl restart", but that was hanging forever.
However "systemctl restart dirsrv@MY-DOMAIN" was able to restart
it after several minutes. Meanwhile the sn-slapd process was using 100%
CPU.

Another remark I want to make. Every ldap connection (ldapsearch, whatever)
hangs for ever. No timeout, nothing.

When it rains, it pours, they say. There is another master with the same 
symptom.
I'm getting nervous now.

Thanks for the Troubleshooting link. I'll have to dive into the deep, I guess.
Could it be a deadlock?

[root@linge ~]# grep -a1 '^#0.*lock' slapd-stacktrace.1587374239.txt
Thread 23 (Thread 0x7ff8ff265700 (LWP 14474)):
#0  0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x00007ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
--
Thread 7 (Thread 0x7ff8f7255700 (LWP 14490)):
#0  0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x00007ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
--
Thread 4 (Thread 0x7ff8f5a52700 (LWP 14493)):
#0  0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x00007ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
--
Thread 2 (Thread 0x7ff92c355700 (LWP 15679)):
#0  0x00007ff92943035e in pthread_rwlock_wrlock () at /lib64/libpthread.so.0
#1  0x00007ff9190b6639 in backend_be_pre_write_cb () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so

Without debuginfo the trace of these threads look like this:

Thread 23 (Thread 0x7ff8ff265700 (LWP 14474)):
#0  0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x00007ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x00007ff9190b8745 in backend_search_cb () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x00007ff92bcd9028 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0
#4  0x00007ff92bcd92e3 in plugin_call_plugins () at 
/usr/lib64/dirsrv/libslapd.so.0
#5  0x00007ff92bccc0d7 in op_shared_search () at /usr/lib64/dirsrv/libslapd.so.0
#6  0x0000562454427bbe in do_search ()
#7  0x000056245441595a in connection_threadmain ()
#8  0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so
#9  0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0
#10 0x00007ff928ad888d in clone () at /lib64/libc.so.6

Thread 7 (Thread 0x7ff8f7255700 (LWP 14490)):
#0  0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x00007ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x00007ff9190b8745 in backend_search_cb () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x00007ff92bcd9028 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0
#4  0x00007ff92bcd92e3 in plugin_call_plugins () at 
/usr/lib64/dirsrv/libslapd.so.0
#5  0x00007ff92bccc0d7 in op_shared_search () at /usr/lib64/dirsrv/libslapd.so.0
#6  0x0000562454427bbe in do_search ()
#7  0x000056245441595a in connection_threadmain ()
#8  0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so
#9  0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0
#10 0x00007ff928ad888d in clone () at /lib64/libc.so.6

Thread 4 (Thread 0x7ff8f5a52700 (LWP 14493)):
#0  0x00007ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x00007ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x00007ff9190b8745 in backend_search_cb () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x00007ff92bcd9028 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0
#4  0x00007ff92bcd92e3 in plugin_call_plugins () at 
/usr/lib64/dirsrv/libslapd.so.0
#5  0x00007ff92bccc0d7 in op_shared_search () at /usr/lib64/dirsrv/libslapd.so.0
#6  0x0000562454427bbe in do_search ()
#7  0x000056245441595a in connection_threadmain ()
#8  0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so
#9  0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0
#10 0x00007ff928ad888d in clone () at /lib64/libc.so.6

Thread 2 (Thread 0x7ff92c355700 (LWP 15679)):
#0  0x00007ff92943035e in pthread_rwlock_wrlock () at /lib64/libpthread.so.0
#1  0x00007ff9190b6639 in backend_be_pre_write_cb () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x00007ff92bcd9028 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0
#3  0x00007ff92bcd92e3 in plugin_call_plugins () at 
/usr/lib64/dirsrv/libslapd.so.0
#4  0x00007ff91b08ac88 in ldbm_back_delete () at 
/usr/lib64/dirsrv/plugins/libback-ldbm.so
#5  0x00007ff92bc89feb in op_shared_delete () at /usr/lib64/dirsrv/libslapd.so.0
#6  0x00007ff92bc8a1c8 in delete_internal_pb () at 
/usr/lib64/dirsrv/libslapd.so.0
#7  0x00007ff9196f131e in delete_changerecord () at 
/usr/lib64/dirsrv/plugins/libretrocl-plugin.so
#8  0x00007ff9196f1605 in changelog_trim_thread_fn () at 
/usr/lib64/dirsrv/plugins/libretrocl-plugin.so
#9  0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so
#10 0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0
#11 0x00007ff928ad888d in clone () at /lib64/libc.so.6

With the following (stripped) debuginfo you can see that it is hanging while 
deleting
a changerecord.

Thread 2 (Thread 0x7ff92c355700 (LWP 15679)):
#0  0x00007ff92943035e in pthread_rwlock_wrlock () at /lib64/libpthread.so.0
#1  0x00007ff9190b6639 in backend_be_pre_write_cb () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x00007ff92bcd9028 in plugin_call_func (list=0x5624565a3340, 
operation=operation@entry=453, pb=pb@entry=0x5624629c28
40, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:2028
         n = <optimized out>
         func = 0x7ff9190b65a0 <backend_be_pre_write_cb>
         rc = <optimized out>
         return_value = 0
         count = 0
#3  0x00007ff92bcd92e3 in plugin_call_list (pb=0x5624629c2840, operation=453, 
list=<optimized out>) at ldap/servers/slap
d/plugin.c:1972
         p = 0x5624564ea2c0
         locked = <optimized out>
         plugin_list_number = 3
         rc = 0
         do_op = <optimized out>
#4  0x00007ff92bcd92e3 in plugin_call_plugins (pb=pb@entry=0x5624629c2840, 
whichfunction=whichfunction@entry=453) at lda
p/servers/slapd/plugin.c:442
         p = 0x5624564ea2c0
         locked = <optimized out>
         plugin_list_number = 3
         rc = 0
         do_op = <optimized out>
#5  0x00007ff91b08ac88 in ldbm_back_delete (pb=0x5624629c2840) at 
ldap/servers/slapd/back-ldbm/ldbm_delete.c:373
         be = 0x5624564fc000
         inst = 0x56245635c680
         li = 0x5624562d3a40
         e = 0x562462377810
         tombstone = 0x0
         original_tombstone = 0x0
         tmptombstone = 0x0
         dn = 0x56245d90fa70 "changenumber=879530,cn=changelog"
         txn = {back_txn_txn = 0x562459c2cc60}
         parent_txn = 0x0
         retval = 0
         msg = <optimized out>
         errbuf = 0x0
         retry_count = <optimized out>
         disk_full = 0
         parent_found = <optimized out>
         ruv_c_init = 0
         parent_modify_c = {old_entry = 0x0, new_entry = 0x0, smods = 0x0, 
attr_encrypt = 0}
         ruv_c = {old_entry = 0x0, new_entry = 0x0, smods = 0x0, attr_encrypt = 
0}
...

What distribution are you using and what version of 389-ds?

Centos7

[root@linge ~]# rpm -qa 389\*
389-ds-base-debuginfo-1.3.9.1-13.el7_7.x86_64
389-ds-base-libs-1.3.9.1-13.el7_7.x86_64
389-ds-base-1.3.9.1-13.el7_7.x86_64

[root@linge ~]# rpm -qa centos-release
centos-release-7-7.1908.0.el7.centos.x86_64


This is a known bug [1].
With the same bug there are two deadlock scenario but only one is fixed (for example in  slapi-nis-0.56.4-1 [2]).
A fix for the second one is under tests.

At the moment I would recommend the workaround [3]. The drawback is a growth of the retroCL database but unless you have a very high rate of update it should not be a concern.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1751295
[2] https://koji.fedoraproject.org/koji/buildinfo?buildID=1457771
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1751295#c5

best regards
thierry
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org

Reply via email to