Hey there, thanks to the recent lock debugging tool[1] and very good luck, I was able to spot the mysterious cyrus 2.4 (and earlier) deadlock.
Here's the output from the lock debugger: /usr/cyrus/bin/imapd (pid 3301) holding WRITE lock for /datastore/imap-mails/user/projects/cyrus.index /usr/cyrus/bin/imapd (pid 21130) ++WAITING++ for WRITE lock on /datastore/imap-mails/user/projects/cyrus.index /usr/cyrus/bin/imapd (pid 20536) ++WAITING++ for WRITE lock on /datastore/imap-mails/user/projects/cyrus.index .. Backtrace of process 3301: #0 0xb77c9428 in __kernel_vsyscall () #1 0xb735af91 in __lll_lock_wait_private () from /lib/libc.so.6 #2 0xb72c88fe in _L_lock_9705 () from /lib/libc.so.6 #3 0xb72c66f0 in malloc () from /lib/libc.so.6 #4 0x080b7557 in xzmalloc (size=32) at xmalloc.c:68 #5 0x080a27b6 in seqset_init (maxval=0, flags=1) at sequence.c:59 #6 0x0806d152 in index_tellexpunge (state=0x9421ca8) at index.c:2319 #7 index_tellchanges (state=0x9421ca8, canexpunge=1, printuid=0) at index.c:2370 #8 0x08071041 in index_check (state=0x9421ca8, usinguid=1, printuid=0) at index.c:682 #9 0x080515ae in idle_update (flags=(IDLE_MAILBOX | IDLE_ALERT)) at imapd.c:2833 #10 0x0809abc5 in idle_handler (sig=14) at idle.c:197 #11 <signal handler called> #12 0xb72c52d4 in _int_malloc () from /lib/libc.so.6 #13 0xb72c66fa in malloc () from /lib/libc.so.6 #14 0xb74bb21c in ?? () from /usr/lib/libcrypto.so.1.0.0 Backtrace stopped: previous frame inner to this frame (corrupt stack?) Tadaaa! We are in a middle of a malloc() call, SIGALRM triggers for imap idle and does another malloc() call that deadlocks. -> never ever put complex code in signal handlers. Only set a volatile flag and be done with it. After I killed process 3301, all the other processes resumed operation as normal. The good news: This specific deadlock shouldn't happen anymore in 2.5+ as the idle code was refactored a few years ago: ------------------------------ commit 17eb391b918c394319e4d1fe5985de10128f34d7 Author: Greg Banks <g...@fastmail.fm> Date: Fri Mar 23 17:27:32 2012 +1100 idle: don't use signals, use AF_UNIX dgrams Communications back from idled to imapds are via a message sent on the AF_UNIX socket. The IDLE command is implemented as a select() loop, and there's absolutely nothing that needs to be done in signal handler context. Best of all, no more unexpected delivery of SIGUSR1 or SIGUSER2, assassinating innocent bystander processes. ------------------------------ @Ken: The keep_alive() function in httpd.c (CalDAV) probably suffers from the same signal handler issue. Cheers, Thomas [1] http://lists.andrew.cmu.edu/pipermail/cyrus-devel/2015-July/003378.html