OK,  had another crash today, but unfortunately the symbol table info for the lib
files was not loading properly so I could not get full details.    I did a make
clean in the lib directory, recompiled cyrus and tested gdb and now the symbol
table is loading properly. Next time I should be able to get full details.

Here is what I had so far.   The problem process appeared to be updating the seen
file when doing a message copy.

More details next crash

Thanks,
John

Full details
-------------------------------------------------------------------
When I checked, I had three blocked imap processes for the user and seven lmtpd
processes.

cyrus    27898 15692  0 Nov09 ?        00:00:00 imapd
cyrus    21991 15692  0 Nov09 ?        00:00:00 imapd
cyrus     5332 15692  0 Nov09 ?        00:00:00 imapd

cyrus    17070 15692  0 Nov09 ?        00:00:00 lmtpd
cyrus    28341 15692  0 06:55 ?        00:00:00 lmtpd
cyrus    29862 15692  0 13:42 ?        00:00:00 lmtpd
cyrus     9286 15692  0 16:01 ?        00:00:00 lmtpd
cyrus     2003 15692  0 18:52 ?        00:00:01 lmtpd
cyrus    10304 15692  0 19:17 ?        00:00:01 lmtpd
cyrus    23400 15692  0 19:56 ?        00:00:00 lmtpd

A quick check of the lmtpd's and two of the imap processes indicated that they were
all blocked trying to access files locked by process 21991

for example:
--------------------------------------------------------------------------------------------------

[root@borg /root]# gdb /usr/cyrus/bin/imapd 5332
<snip>
0x4028a8a1 in __flock () from /lib/libc.so.6
(gdb) bt
#0  0x4028a8a1 in __flock () from /lib/libc.so.6
#1  0x8079d27 in lock_reopen ()
#2  0x8062028 in mailbox_lock_header (mailbox=0xbfffedb0) at mailbox.c:812
#3  0x805f65d in append_setup (as=0xbfffedb0,
    name=0xbffff2c0 "user.lraven.Trash", format=0, userid=0x810fdd8 "lraven",
    auth_state=0x8110280, aclcheck=16, quotacheck=3061) at append.c:113
#4  0x80599ff in index_copy (mailbox=0x80fff40, sequence=0x810fce8 "6194",
    usinguid=1, name=0xbffff2c0 "user.lraven.Trash", copyuidp=0xbffff2bc)
    at index.c:1220
#5  0x80531eb in cmd_copy (tag=0x810fc08 "27", sequence=0x810fce8 "6194",
    name=0x810fd58 "INBOX.Trash", usinguid=1) at imapd.c:3192
#6  0x804d878 in cmdloop () at imapd.c:791
#7  0x804cf50 in service_main (argc=1, argv=0xbffffe14, envp=0xbffffe1c)
    at imapd.c:546
#8  0x804bd00 in main ()
#9  0x401d2f31 in __libc_start_main (main=0x804b8fc <main>, argc=1,
    ubp_av=0xbffffe14, init=0x804a9e4 <_init>, fini=0x807e96c <_fini>,
    rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffffe0c)
    at ../sysdeps/generic/libc-start.c:129
(gdb) up
#1  0x8079d27 in lock_reopen ()
(gdb) up
#2  0x8062028 in mailbox_lock_header (mailbox=0xbfffedb0) at mailbox.c:812
812         r = lock_reopen(mailbox->header_fd, fnamebuf, &sbuf, &lockfailaction
);
(gdb) print fnamebuf
$1 = "/var/spool/imap/3/user/lraven/Trash/cyrus.header\000*-@", '\000' <repeats
12 times>, " Þÿ¿\000\000\000\000\002\000\000\000\000\000\000\0000Þÿ¿\000\000\000
\000\000\000\000\000L®-@\000\000\000\000\000\000\000\000|\027\000@|\n\000@\005\b
\000\000\000\000\000\000\000\000\000\000\016\0000\000\200\201\000\000\001\000\00
0\000ÿ\001\000\000\000\000\000\000=6\000\000\000\000\000\000\000\000\000\000ì\00
6\000\000\000\000\000\000\000\020\000\000\b\000\000\000\000\000\000\000Ílì;\000\
000\000\000ITì;L®-@ITì;\000\000\000\000\016\000"...
(gdb) quit
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/cyrus/bin/imapd, Pid 5332

[root@borg /root]# lsof -p 5332 | grep cyrus.header
imapd   5332 cyrus  mem    REG        8,5      174 3145792 /var/spool/imap/3/use
r/lraven/cyrus.header
imapd   5332 cyrus  mem    REG        8,5      174 2998385 /var/spool/imap/3/use
r/lraven/Trash/cyrus.header
imapd   5332 cyrus   14u   REG        8,5      174 3145792 /var/spool/imap/3/use
r/lraven/cyrus.header
imapd   5332 cyrus   19u   REG        8,5      174 2998385 /var/spool/imap/3/use
r/lraven/Trash/cyrus.header

[root@borg /root]# grep 2998385 /proc/locks
6: FLOCK  ADVISORY  WRITE 21991 08:05:2998385 0 9223372036854775807 e03b23c0 e03
b2be0 e03b28c0 00000000 f57a4d60
6: -> FLOCK  ADVISORY  WRITE 27898 08:05:2998385 0 9223372036854775807 f57a4d60
00000000 00000000 00000000 f68f5aa0
6: -> FLOCK  ADVISORY  WRITE 5332 08:05:2998385 0 9223372036854775807 f68f5aa0 0
0000000 00000000 00000000 e03b23c0
--------------------------------------------------------------------------------------------------

looking at the errant process that had the lock(21991):

[root@borg /rosof |gdb /usr/cyrus/bin/imapd 21991
<snip>
0x4028a8a1 in __flock () from /lib/libc.so.6
(gdb) bt
#0  0x4028a8a1 in __flock () from /lib/libc.so.6
#1  0x8079d27 in lock_reopen ()
#2  0x807b9f7 in starttxn_or_refetch ()
#3  0x807bb69 in myfetch ()
#4  0x807bc09 in fetchlock ()
#5  0x806f933 in seen_readit (seendb=0x810ea30, lastreadptr=0xbfffed38,
    lastuidptr=0xbfffed3c, lastchangeptr=0xbfffed40, seenuidsptr=0xbfffed44,
    rw=1) at seen_db.c:268
#6  0x806fab8 in seen_lockread (seendb=0x810ea30, lastreadptr=0xbfffed38,
    lastuidptr=0xbfffed3c, lastchangeptr=0xbfffed40, seenuidsptr=0xbfffed44)
    at seen_db.c:332
#7  0x8060dc6 in append_addseen (mailbox=0xbfffedb0,
    userid=0xbffff05c "lraven", msgrange=0x810fa50 "1817") at append.c:896
#8  0x805f945 in append_commit (as=0xbfffedb0, uidvalidity=0xbfffeda4,
    start=0xbfffeda8, num=0xbfffedac) at append.c:240
#9  0x8059a4b in index_copy (mailbox=0x80fff40, sequence=0x810fce8 "6193",
    usinguid=1, name=0xbffff2c0 "user.lraven.Trash", copyuidp=0xbffff2bc)
    at index.c:1226
#10 0x80531eb in cmd_copy (tag=0x810fc08 "24", sequence=0x810fce8 "6193",
    name=0x810fd58 "INBOX.Trash", usinguid=1) at imapd.c:3192
#11 0x804d878 in cmdloop () at imapd.c:791
#12 0x804cf50 in service_main (argc=1, argv=0xbffffe14, envp=0xbffffe1c)
    at imapd.c:546
---Type <return> to continue, or q <return> to quit---
#13 0x804bd00 in main ()
#14 0x401d2f31 in __libc_start_main (main=0x804b8fc <main>, argc=1,
    ubp_av=0xbffffe14, init=0x804a9e4 <_init>, fini=0x807e96c <_fini>,
    rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffffe0c)
    at ../sysdeps/generic/libc-start.c:129
(gdb) up
#1  0x8079d27 in lock_reopen ()
(gdb) up
#2  0x807b9f7 in starttxn_or_refetch ()
(gdb) up
#3  0x807bb69 in myfetch ()
(gdb) up
#4  0x807bc09 in fetchlock ()
(gdb) up
#5  0x806f933 in seen_readit (seendb=0x810ea30, lastreadptr=0xbfffed38,
    lastuidptr=0xbfffed3c, lastchangeptr=0xbfffed40, seenuidsptr=0xbfffed44,
    rw=1) at seen_db.c:268
268             r = DB->fetch(seendb->db,
(gdb) print seendb->db
$1 = (struct db *) 0x810ea50
(gdb) print seendb->path
$2 = 0x8104468 "/var/spool/imap/3/user/lraven/Trash"
(gdb) print seendb->uniqueid
$3 = 0x810dd38 "7ce7c4853b5b293a"
(gdb) print seendb->tid
$4 = (struct txn *) 0x0
(gdb) print seendb->converting
$5 = 0
(gdb) print seendb->user
$6 = 0x810e1e8 "lraven"

At this point I got stuck because I could not find the file descriptor and inode
from the db structure without the appropriate symbols loaded.

I tried to go at it the other way from ls, lsof and /proc/locks
--------------------------------------------------
[root@borg /root]# ls -i /var/imap/user/l/lraven.seen
  64244 /var/imap/user/l/lraven.seen

[root@borg /root]# lsof -p 21991 | grep seen
imapd   21991 cyrus  mem    REG       8,10      175   64244 /var/imap/user/l/lra
ven.seen
imapd   21991 cyrus   18u   REG       8,10      175   64140 /var/imap/user/l/lra
ven.seen
imapd   21991 cyrus   23u   REG       8,10      180   64168 /var/imap/user/l/lra
ven.seen
[root@borg/root]# grep 64244 /proc/locks
[root@borg /root]# grep 64140 /proc/locks
[root@borg /root]# grep 64168 /proc/locks
7: FLOCK  ADVISORY  WRITE 21991 08:0a:64168 0 9223372036854775807 e03b28c0 e03b2
3c0 f57a4680 00000000 f57a4220
7: -> FLOCK  ADVISORY  WRITE 21991 08:0a:64168 0 9223372036854775807 f57a4220 00
000000 00000000 00000000 e03b28c0

[root@borg /root]# lsof | grep lraven.seen | less
imapd      5332  cyrus  mem    REG       8,10      175      64244 /var/imap/user
/l/lraven.seen
imapd      5332  cyrus   17u   REG       8,10      175      64652 /var/imap/user
/l/lraven.seen
imapd     21991  cyrus  mem    REG       8,10      175      64244 /var/imap/user
/l/lraven.seen
imapd     21991  cyrus   18u   REG       8,10      175      64140 /var/imap/user
/l/lraven.seen
imapd     21991  cyrus   23u   REG       8,10      180      64168 /var/imap/user
/l/lraven.seen
imapd     27898  cyrus  mem    REG       8,10      175      64244 /var/imap/user
/l/lraven.seen
imapd     27898  cyrus   18u   REG       8,10      175      64244 /var/imap/user
/l/lraven.seen

--------------------------------------------------------------------
I was suprised to see multiple inode numbers for the file
/var/imap/user/l/lraven.seen       This seemed really strange since the process in
question has a lock on an inode that no longer seems to exists (64168)   It would
seem that another  process 27898 has replaced the seen file even though it is
locked.    Since I could not verify which inode process 21991 was hanging trying to
lock, we can't tell for sure,  Next time I should be able to get full details.
Killing this process (21991) did allow all other processes to complete normally.


> Lawrence Greenfield wrote:
>
> > It's really crucial to understand what process is holding this lock.
> > It might not be reused---it's probably blocked doing something else
> > and never releasing the lock.  imapds should never do something that
> > will block for a long time while holding a lock but evidentally
> > something is wrong.
> >
> > Finding the process that holds the lock, doing a backtrace on it and
> > (even better) figuring out how it gets into that state is really the
> > crucial issue for fixing this problem.
>

Reply via email to