OK, had another crash today, but unfortunately the symbol table info for the lib files was not loading properly so I could not get full details. I did a make clean in the lib directory, recompiled cyrus and tested gdb and now the symbol table is loading properly. Next time I should be able to get full details.
Here is what I had so far. The problem process appeared to be updating the seen file when doing a message copy. More details next crash Thanks, John Full details ------------------------------------------------------------------- When I checked, I had three blocked imap processes for the user and seven lmtpd processes. cyrus 27898 15692 0 Nov09 ? 00:00:00 imapd cyrus 21991 15692 0 Nov09 ? 00:00:00 imapd cyrus 5332 15692 0 Nov09 ? 00:00:00 imapd cyrus 17070 15692 0 Nov09 ? 00:00:00 lmtpd cyrus 28341 15692 0 06:55 ? 00:00:00 lmtpd cyrus 29862 15692 0 13:42 ? 00:00:00 lmtpd cyrus 9286 15692 0 16:01 ? 00:00:00 lmtpd cyrus 2003 15692 0 18:52 ? 00:00:01 lmtpd cyrus 10304 15692 0 19:17 ? 00:00:01 lmtpd cyrus 23400 15692 0 19:56 ? 00:00:00 lmtpd A quick check of the lmtpd's and two of the imap processes indicated that they were all blocked trying to access files locked by process 21991 for example: -------------------------------------------------------------------------------------------------- [root@borg /root]# gdb /usr/cyrus/bin/imapd 5332 <snip> 0x4028a8a1 in __flock () from /lib/libc.so.6 (gdb) bt #0 0x4028a8a1 in __flock () from /lib/libc.so.6 #1 0x8079d27 in lock_reopen () #2 0x8062028 in mailbox_lock_header (mailbox=0xbfffedb0) at mailbox.c:812 #3 0x805f65d in append_setup (as=0xbfffedb0, name=0xbffff2c0 "user.lraven.Trash", format=0, userid=0x810fdd8 "lraven", auth_state=0x8110280, aclcheck=16, quotacheck=3061) at append.c:113 #4 0x80599ff in index_copy (mailbox=0x80fff40, sequence=0x810fce8 "6194", usinguid=1, name=0xbffff2c0 "user.lraven.Trash", copyuidp=0xbffff2bc) at index.c:1220 #5 0x80531eb in cmd_copy (tag=0x810fc08 "27", sequence=0x810fce8 "6194", name=0x810fd58 "INBOX.Trash", usinguid=1) at imapd.c:3192 #6 0x804d878 in cmdloop () at imapd.c:791 #7 0x804cf50 in service_main (argc=1, argv=0xbffffe14, envp=0xbffffe1c) at imapd.c:546 #8 0x804bd00 in main () #9 0x401d2f31 in __libc_start_main (main=0x804b8fc <main>, argc=1, ubp_av=0xbffffe14, init=0x804a9e4 <_init>, fini=0x807e96c <_fini>, rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffffe0c) at ../sysdeps/generic/libc-start.c:129 (gdb) up #1 0x8079d27 in lock_reopen () (gdb) up #2 0x8062028 in mailbox_lock_header (mailbox=0xbfffedb0) at mailbox.c:812 812 r = lock_reopen(mailbox->header_fd, fnamebuf, &sbuf, &lockfailaction ); (gdb) print fnamebuf $1 = "/var/spool/imap/3/user/lraven/Trash/cyrus.header\000*-@", '\000' <repeats 12 times>, " Þÿ¿\000\000\000\000\002\000\000\000\000\000\000\0000Þÿ¿\000\000\000 \000\000\000\000\000L®-@\000\000\000\000\000\000\000\000|\027\000@|\n\000@\005\b \000\000\000\000\000\000\000\000\000\000\016\0000\000\200\201\000\000\001\000\00 0\000ÿ\001\000\000\000\000\000\000=6\000\000\000\000\000\000\000\000\000\000ì\00 6\000\000\000\000\000\000\000\020\000\000\b\000\000\000\000\000\000\000Ílì;\000\ 000\000\000ITì;L®-@ITì;\000\000\000\000\016\000"... (gdb) quit (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: /usr/cyrus/bin/imapd, Pid 5332 [root@borg /root]# lsof -p 5332 | grep cyrus.header imapd 5332 cyrus mem REG 8,5 174 3145792 /var/spool/imap/3/use r/lraven/cyrus.header imapd 5332 cyrus mem REG 8,5 174 2998385 /var/spool/imap/3/use r/lraven/Trash/cyrus.header imapd 5332 cyrus 14u REG 8,5 174 3145792 /var/spool/imap/3/use r/lraven/cyrus.header imapd 5332 cyrus 19u REG 8,5 174 2998385 /var/spool/imap/3/use r/lraven/Trash/cyrus.header [root@borg /root]# grep 2998385 /proc/locks 6: FLOCK ADVISORY WRITE 21991 08:05:2998385 0 9223372036854775807 e03b23c0 e03 b2be0 e03b28c0 00000000 f57a4d60 6: -> FLOCK ADVISORY WRITE 27898 08:05:2998385 0 9223372036854775807 f57a4d60 00000000 00000000 00000000 f68f5aa0 6: -> FLOCK ADVISORY WRITE 5332 08:05:2998385 0 9223372036854775807 f68f5aa0 0 0000000 00000000 00000000 e03b23c0 -------------------------------------------------------------------------------------------------- looking at the errant process that had the lock(21991): [root@borg /rosof |gdb /usr/cyrus/bin/imapd 21991 <snip> 0x4028a8a1 in __flock () from /lib/libc.so.6 (gdb) bt #0 0x4028a8a1 in __flock () from /lib/libc.so.6 #1 0x8079d27 in lock_reopen () #2 0x807b9f7 in starttxn_or_refetch () #3 0x807bb69 in myfetch () #4 0x807bc09 in fetchlock () #5 0x806f933 in seen_readit (seendb=0x810ea30, lastreadptr=0xbfffed38, lastuidptr=0xbfffed3c, lastchangeptr=0xbfffed40, seenuidsptr=0xbfffed44, rw=1) at seen_db.c:268 #6 0x806fab8 in seen_lockread (seendb=0x810ea30, lastreadptr=0xbfffed38, lastuidptr=0xbfffed3c, lastchangeptr=0xbfffed40, seenuidsptr=0xbfffed44) at seen_db.c:332 #7 0x8060dc6 in append_addseen (mailbox=0xbfffedb0, userid=0xbffff05c "lraven", msgrange=0x810fa50 "1817") at append.c:896 #8 0x805f945 in append_commit (as=0xbfffedb0, uidvalidity=0xbfffeda4, start=0xbfffeda8, num=0xbfffedac) at append.c:240 #9 0x8059a4b in index_copy (mailbox=0x80fff40, sequence=0x810fce8 "6193", usinguid=1, name=0xbffff2c0 "user.lraven.Trash", copyuidp=0xbffff2bc) at index.c:1226 #10 0x80531eb in cmd_copy (tag=0x810fc08 "24", sequence=0x810fce8 "6193", name=0x810fd58 "INBOX.Trash", usinguid=1) at imapd.c:3192 #11 0x804d878 in cmdloop () at imapd.c:791 #12 0x804cf50 in service_main (argc=1, argv=0xbffffe14, envp=0xbffffe1c) at imapd.c:546 ---Type <return> to continue, or q <return> to quit--- #13 0x804bd00 in main () #14 0x401d2f31 in __libc_start_main (main=0x804b8fc <main>, argc=1, ubp_av=0xbffffe14, init=0x804a9e4 <_init>, fini=0x807e96c <_fini>, rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffffe0c) at ../sysdeps/generic/libc-start.c:129 (gdb) up #1 0x8079d27 in lock_reopen () (gdb) up #2 0x807b9f7 in starttxn_or_refetch () (gdb) up #3 0x807bb69 in myfetch () (gdb) up #4 0x807bc09 in fetchlock () (gdb) up #5 0x806f933 in seen_readit (seendb=0x810ea30, lastreadptr=0xbfffed38, lastuidptr=0xbfffed3c, lastchangeptr=0xbfffed40, seenuidsptr=0xbfffed44, rw=1) at seen_db.c:268 268 r = DB->fetch(seendb->db, (gdb) print seendb->db $1 = (struct db *) 0x810ea50 (gdb) print seendb->path $2 = 0x8104468 "/var/spool/imap/3/user/lraven/Trash" (gdb) print seendb->uniqueid $3 = 0x810dd38 "7ce7c4853b5b293a" (gdb) print seendb->tid $4 = (struct txn *) 0x0 (gdb) print seendb->converting $5 = 0 (gdb) print seendb->user $6 = 0x810e1e8 "lraven" At this point I got stuck because I could not find the file descriptor and inode from the db structure without the appropriate symbols loaded. I tried to go at it the other way from ls, lsof and /proc/locks -------------------------------------------------- [root@borg /root]# ls -i /var/imap/user/l/lraven.seen 64244 /var/imap/user/l/lraven.seen [root@borg /root]# lsof -p 21991 | grep seen imapd 21991 cyrus mem REG 8,10 175 64244 /var/imap/user/l/lra ven.seen imapd 21991 cyrus 18u REG 8,10 175 64140 /var/imap/user/l/lra ven.seen imapd 21991 cyrus 23u REG 8,10 180 64168 /var/imap/user/l/lra ven.seen [root@borg/root]# grep 64244 /proc/locks [root@borg /root]# grep 64140 /proc/locks [root@borg /root]# grep 64168 /proc/locks 7: FLOCK ADVISORY WRITE 21991 08:0a:64168 0 9223372036854775807 e03b28c0 e03b2 3c0 f57a4680 00000000 f57a4220 7: -> FLOCK ADVISORY WRITE 21991 08:0a:64168 0 9223372036854775807 f57a4220 00 000000 00000000 00000000 e03b28c0 [root@borg /root]# lsof | grep lraven.seen | less imapd 5332 cyrus mem REG 8,10 175 64244 /var/imap/user /l/lraven.seen imapd 5332 cyrus 17u REG 8,10 175 64652 /var/imap/user /l/lraven.seen imapd 21991 cyrus mem REG 8,10 175 64244 /var/imap/user /l/lraven.seen imapd 21991 cyrus 18u REG 8,10 175 64140 /var/imap/user /l/lraven.seen imapd 21991 cyrus 23u REG 8,10 180 64168 /var/imap/user /l/lraven.seen imapd 27898 cyrus mem REG 8,10 175 64244 /var/imap/user /l/lraven.seen imapd 27898 cyrus 18u REG 8,10 175 64244 /var/imap/user /l/lraven.seen -------------------------------------------------------------------- I was suprised to see multiple inode numbers for the file /var/imap/user/l/lraven.seen This seemed really strange since the process in question has a lock on an inode that no longer seems to exists (64168) It would seem that another process 27898 has replaced the seen file even though it is locked. Since I could not verify which inode process 21991 was hanging trying to lock, we can't tell for sure, Next time I should be able to get full details. Killing this process (21991) did allow all other processes to complete normally. > Lawrence Greenfield wrote: > > > It's really crucial to understand what process is holding this lock. > > It might not be reused---it's probably blocked doing something else > > and never releasing the lock. imapds should never do something that > > will block for a long time while holding a lock but evidentally > > something is wrong. > > > > Finding the process that holds the lock, doing a backtrace on it and > > (even better) figuring out how it gets into that state is really the > > crucial issue for fixing this problem. >