Hi.

(Im am sending this to the "stable" list, because it maybe kernel related.. )

On 9.1-RELEASE I am witnessing lockups of the openldap slapd daemon.

The slapd runs for some days and then hangs, consuming high amounts of CPU.
In this state slapd can only be restarted by SIGKILL.

 # procstat -kk 71195
  PID    TID COMM             TDNAME           KSTACK                       
71195 149271 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d do_wait+0x678 
__umtx_op_wait+0x68 amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 194998 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _cv_wait_sig+0x12e 
seltdwait+0x110 kern_select+0x6ef sys_select+0x5d amd64_syscall+0x546 
Xfast_syscall+0xf7 
71195 195544 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 196183 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_timedwait_sig+0x19 _sleep+0x2d4 userret+0x9e 
doreti_ast+0x1f 
71195 197966 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 198446 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 198453 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 198563 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 199520 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 200038 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 200670 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 200674 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 200675 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 201179 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 201180 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 201181 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 201183 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 201189 slapd            -                mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7

When I try to stop slapd through the rc script I can see in the logs that the 
process is waiting for a thread to terminate - indefinitely.
Other multithreaded server processes running on the server without problems 
(apache-worker, mysqld, bind, etc.)
On UFS2 slapd runs fine, without showing the error.


Things I have tried already to stop the lockups:

- running openldap-server23, openldap24 both with different BDB backend 
versions.
- tuning the BDB Init File
- reducing the threads used by slapd through slapd.conf

What I didn't try until now:

Mounting a zfs vdev into the jail, to have the BDB storing its data on UFS. 
(don't like the idea)


Environment:

- freebsd 9.1-rel-amd64 multijail server with cpu resource limit patch[1], 
which didn't make it into 9.1-rel 
- filesystem: zfs-only, swap on zfs
- active jail limits through rctl.conf (memory, maxprocs, open files)
- a handfull of openldap-server jails that show the same slapd lockup tendency.
- slapd started through daemontools (supvervise)

Some ideas:
- openldap-server with BDB backend uses sparse files for storing the data - on 
top of ZFS.

Has anyone else running openldap-server on FreeBSD 9.1 inside a jail seen 
similar problems?
How can I debug this further?

Any hints appreciated :-)

Regards.


[1] https://wiki.freebsd.org/JailResourceLimits
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to