Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
Timo, I reinstalled all our servers with -O0 and full debugging on -gdwarf-2 -g3. I should be able to look up anything you need done now. Here's some more examples with the data you asked for last time, now without optimization. I also noticed that there are a few users that seem to be regulars with this problem. #4 0x004d8fd7 in i_panic (format=0x501688 file %s: line %d (%s): assertion failed: (%s)) at failures.c:207 args = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0x7fffcf52b5a0, reg_save_area = 0x7fffcf52b4e0}} #5 0x0044f34e in maildir_uidlist_records_array_delete (uidlist=0x1b612330, rec=0x1b61f1f8) at maildir-uidlist.c:405 recs = (struct maildir_uidlist_rec * const *) 0x1b607f60 pos = (struct maildir_uidlist_rec * const *) 0x0 idx = 0 count = 2 __PRETTY_FUNCTION__ = maildir_uidlist_records_array_delete #6 0x0044f8fc in maildir_uidlist_next (uidlist=0x1b612330, line=0x1b60426a 1249562340.M302085P7316V04240006I00DB2CF1_0.gehenna7.rutgers.edu,S=19277:2,S) at maildir-uidlist.c:552 rec = (struct maildir_uidlist_rec *) 0x1b6209f8 old_rec = (struct maildir_uidlist_rec *) 0x1b61f1f8 recs = (struct maildir_uidlist_rec * const *) 0x1b604264 count = 0 uid = 2807 (gdb) fr 5 #5 0x0044f34e in maildir_uidlist_records_array_delete (uidlist=0x1b612330, rec=0x1b61f1f8) at maildir-uidlist.c:405 405i_assert(pos != NULL); (gdb) p *recs $1 = (struct maildir_uidlist_rec * const) 0x1b61fa60 (gdb) p count $2 = 2 (gdb) p *recs[0] $3 = {uid = 2735, flags = 6, filename = 0x1b61fa78 1262739815.M913182P4280V04240006I014D0F92_0.gehenna7.rutgers.edu,S=11791:2,, extensions = 0x1b620980 W12036} (gdb) p *recs[1] $4 = {uid = 2736, flags = 0, filename = 0x1b6209a0 1262739645.M307617P13458V04240006I014D0F91_0.gehenna10.rutgers.edu,S=5005:2,S, extensions = 0x1b6209f0 W5124} (gdb) Another user for comparison: (gdb) fr 5 #5 0x0044f34e in maildir_uidlist_records_array_delete (uidlist=0x6bd7680, rec=0x6bc93d0) at maildir-uidlist.c:405 405i_assert(pos != NULL); (gdb) p *recs $1 = (struct maildir_uidlist_rec * const) 0x6d5c418 (gdb) p count $2 = 7 (gdb) p *recs[0] $3 = {uid = 52566, flags = 4, filename = 0x6d5c438 1262781360.M836147P9323V04240007I0148DC53_0.gehenna5.rutgers.edu,S=2157:2,a, extensions = 0x6d5c430 W2201} (gdb) p *recs[1] $4 = {uid = 52568, flags = 0, filename = 0x6cd4590 126262.M94168P6001V04240006I003C41C2_0.gehenna10.rutgers.edu,S=22946:2,Sa, extensions = 0x6cd4588 W23481} (gdb) p *recs[2] $5 = {uid = 52569, flags = 0, filename = 0x6bc9328 1041233872.06.mbox:2,S, extensions = 0x6bc9318 W6334} (gdb) p *recs[3] $6 = {uid = 52570, flags = 0, filename = 0x6bc9370 1041233872.13.mbox:2,S, extensions = 0x6bc9360 W1193} (gdb) p *recs[4] $7 = {uid = 52571, flags = 0, filename = 0x6bc92e0 1041233872.14.mbox:2,RS, extensions = 0x6bc92d0 W3095} (gdb) p *recs[5] $8 = {uid = 52572, flags = 4, filename = 0x6d58ec8 1262782141.M730129P15749V03E80006I0148DC54_0.gehenna9.rutgers.edu,S=7454:2,, extensions = 0x6d62160 W7626} (gdb) p *recs[6] $9 = {uid = 52573, flags = 0, filename = 0x6d62140 1041233872.16.mbox:2,S, extensions = 0x0} (gdb) And one more user: (gdb) fr 5 #5 0x0044f34e in maildir_uidlist_records_array_delete (uidlist=0x10f44680, rec=0x10f67120) at maildir-uidlist.c:405 405i_assert(pos != NULL); (gdb) p *recs $1 = (struct maildir_uidlist_rec * const) 0x11093760 (gdb) p count $2 = 2 (gdb) p *recs[0] $3 = {uid = 55108, flags = 0, filename = 0x11093778 1262788462.M201250P7938V04240006I01EF76E0_0.gehenna10.rutgers.edu,S=1731278:2,, extensions = 0x1109d240 W1753846} (gdb) p *recs[1] $4 = {uid = 55109, flags = 4, filename = 0x1109d268 1262788793.M851477P3866V045C0007I01EF76E3_0.gehenna8.rutgers.edu,S=19990, extensions = 0x0} On 12/30/2009 12:10 PM, David Halik wrote: Ok, I think I've got some more info and a more accurate time line for you. I tried this on two different dumps from two different users. The count was 4 in the first example and 0 in the second. I'm guessing that's considered small? The links to my gdb sessions for both are below and have some of the info you were looking for. The corresponding logs are also there so you can see how each failed. I put everything on pastebin so it's a little easier to see. By the way, I also found that the stale NFS file handle message does appear first in each instance, it was just farther back in the logs. Lowering uid message also appears immediately after every stale NFS message, which in turn causes all of this n amount of time later (sometimes 5 minutes, sometimes 20) when a user does a new action. The file reappeared message only occurs some of the time. Here's the chain of events in every case so far that I can see: 1) fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle 2)
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
Ok, I think I've got some more info and a more accurate time line for you. I tried this on two different dumps from two different users. The count was 4 in the first example and 0 in the second. I'm guessing that's considered small? The links to my gdb sessions for both are below and have some of the info you were looking for. The corresponding logs are also there so you can see how each failed. I put everything on pastebin so it's a little easier to see. By the way, I also found that the stale NFS file handle message does appear first in each instance, it was just farther back in the logs. Lowering uid message also appears immediately after every stale NFS message, which in turn causes all of this n amount of time later (sometimes 5 minutes, sometimes 20) when a user does a new action. The file reappeared message only occurs some of the time. Here's the chain of events in every case so far that I can see: 1) fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle 2) /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: next_uid was lowered (n - n-1, hdr=n-1) ...a few minutes later... (may or may not be a message reappeared warning at this point) 3) /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 3: 1261057547.M378185P17303V03E80002I0197FB4A_0.gehenna9.rutgers.edu,S=7174:2,RS (uid i - n+1,2,3 ) 4) Panic: file maildir-uidlist.c: line 405 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) One thing to note, after the Expunged message reappeared, giving a new UID he died quickly and one more than one server simultaneously. The gdb output is from server gehenna11 of that log file. The uid in *recs[0] is also the number that you can see in the logs being lowered from 719 - 718. First user log: http://pastebin.com/m1718f07b First user gdb: http://pastebin.com/m40088dc8 The second user also died on more than one server. The output is also from gehenna11 Second user log: http://pastebin.com/f3a1756f2 Second user gdb: http://pastebin.com/m59aacde4 On 12/29/2009 7:50 PM, Timo Sirainen wrote: On 29.12.2009, at 19.09, David Halik wrote: I'll definitely get back to you on this. Right now we're closed until after New Years and I don't want to go updating the dovecot package on all of our servers until we're all back at work. I did do some quick poking around and the count is optimized out, so I'll have the package rebuilt without optimization and let you what the values are at the beginning of next week. Thanks again. well, you can probably also get the values in a bit more difficult way: p count = p uidlist.records.arr.buffer.used / uidlist.records.arr.element_size p recs[n] = p *(*uidlist.records.v)[n]
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
On Wed, 2009-12-23 at 10:37 -0800, Brandon Davidson wrote: We too have set dotlock_use_excl = no. I'm not seeing the Stale NFS file handle message any more, but I am still seeing a crash. The crashes seem to be leaving the indexes in a bad state: Dec 23 09:07:44 oh-popmap3p dovecot: imap: user=cnisser, rip=x.x.x.x, pid=30101: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) You mean this doesn't get fixed by itself? It's not indexes, but dovecot-uidlist file that's the problem.. pid=7641: read(/home6/joet/Maildir/dovecot-uidlist) failed: Input/output error These just shouldn't be happening.. Or perhaps that's the same problem as ESTALE. signature.asc Description: This is a digitally signed message part
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
Hey Timo, hope you enjoyed your vacation. :) I just wanted to point out that I'm still seeing the (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) dump even after your patch. The (match == IMAP_MATCH_YES) dump is definitely fixed, but the other dump still remains... On 12/24/2009 11:39 AM, David Halik wrote: I should probably also post the messages leading up for reference. Note that I did not see any stale NFS messages this time, but did get the usual duplicate file messages: Dec 24 10:43:07 gehenna11 dovecot: IMAP(user): /rci/nqu/rci/u1/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 4: 1231949492.M325236P5922V03E80009I01B6DCA5_0.gehenna9.rutgers.edu,S=4398:2,S (uid 514 - 721) Dec 24 10:43:08 gehenna11 dovecot: IMAP(user): Maildir /rci/u1/user/Maildir: Expunged message reappeared, giving a new UID (old uid=516, file=1231952516.M164875P24494V03E80009I01B6DCAC_0.gehenna9.rutgers.edu,S=4355:2,S) Dec 24 10:44:38 gehenna11 dovecot: IMAP(user): /rci/nqu/rci/u1/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 6: 1233524267.M97899P9543V04240006I01B6E908_0.gehenna5.rutgers.edu,S=3039:2,S (uid 517 - 723) Dec 24 10:44:38 gehenna11 dovecot: IMAP(user): Panic: file maildir-uidlist.c: line 405 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 24 10:44:38 gehenna11 dovecot: IMAP(user): Raw backtrace: /usr/libexec/dovecot/imap [0x49edd0] - /usr/libexec/dovecot/imap [0x49ee33] - /usr/libexec/dovecot/imap [0x49e496] - /usr/libexec/dovecot/imap [0x43f064] - /usr/libexec/dovecot/imap(maildir_uidlist_refresh+0x2d0) [0x43fbf0] - /usr/libexec/dovecot/imap [0x43c91f] - /usr/libexec/dovecot/imap(maildir_storage_sync_init+0x14e) [0x43cefe] - /usr/libexec/dovecot/imap(imap_sync_init+0x67) [0x4283d7] - /usr/libexec/dovecot/imap [0x41bf5e] - /usr/libexec/dovecot/imap [0x458ecd] - /usr/libexec/dovecot/imap(io_loop_handle_timeouts+0x8b) [0x4a5f6b] - /usr/libexec/dovecot/imap(io_loop_handler_run+0x73) [0x4a6bd3] - /usr/libexec/dovecot/imap(io_loop_run+0x1d) [0x4a5d6d] - /usr/libexec/dovecot/imap(main+0x620) [0x428f20] - /lib64/libc.so.6(__libc_start_main+0xf4) [0x354301d994] - /usr/libexec/dovecot/imap [0x419ac9] Dec 24 10:44:38 gehenna11 dovecot: dovecot: child 19032 (imap) killed with signal 6 (core dumped) On 12/24/2009 11:26 AM, David Halik wrote: Dec 23 11:05:44 gehenna18.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) This is hopefully fixed by: http://hg.dovecot.org/dovecot-1.2/rev/d00abdaf9df2 Timo, I came into work this morning and found three new core dumps from maildir_uidlist_records_array_delete while running the patch you gave me yesterday, so I guess it's not fixed after all. I double checked that it is in fact running your patch, and it is. You can tell also that maildir-uidlist.c: line 403 changed to 405. Here's the short backtrace: #0 0x003543030265 in raise (sig=value optimized out) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x003543031d10 in abort () at abort.c:88 #2 0x0049eddd in default_fatal_finish (type=value optimized out, status=0) at failures.c:160 #3 0x0049ee33 in i_internal_fatal_handler (type=LOG_TYPE_PANIC, status=0, fmt=value optimized out, args=value optimized out) at failures.c:443 #4 0x0049e496 in i_panic (format=0x6 Address 0x6 out of bounds) at failures.c:207 #5 0x0043f064 in maildir_uidlist_records_array_delete (uidlist=value optimized out, rec=0x16dbd48) at maildir-uidlist.c:405 #6 0x0043fbf0 in maildir_uidlist_refresh (uidlist=0x16c75b0) at maildir-uidlist.c:552 #7 0x0043c91f in maildir_sync_context (ctx=0x16a7188, forced=false, find_uid=0x0, lost_files_r=0x7fff86f83d87) at maildir-sync.c:659 #8 0x0043cefe in maildir_storage_sync_init (box=0x16b64f8, flags=0) at maildir-sync.c:924 #9 0x004283d7 in imap_sync_init (client=0x16b27a0, box=0x16b64f8, imap_flags=0, flags=0) at imap-sync.c:146 #10 0x0041bf5e in idle_sync_now (box=0x4a58, ctx=0x16b3db8) at cmd-idle.c:127 #11 0x00458ecd in check_timeout (ibox=0x16b64f8) at index-mailbox-check.c:43 #12 0x004a5f6b in io_loop_handle_timeouts (ioloop=0x16afb60) at ioloop.c:313 #13 0x004a6bd3 in io_loop_handler_run (ioloop=0x16afb60) at ioloop-epoll.c:180 #14 0x004a5d6d in io_loop_run (ioloop=0x16afb60) at ioloop.c:335 #15 0x00428f20 in main (argc=value optimized out, argv=value optimized out, envp=0x7fff86f840b8) at main.c:327 And the full backtrace: http://pastebin.com/f651f649e
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
On Thu, 2009-12-24 at 11:39 -0500, David Halik wrote: Dec 24 10:43:07 gehenna11 dovecot: IMAP(user): /rci/nqu/rci/u1/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 4: 1231949492.M325236P5922V03E80009I01B6DCA5_0.gehenna9.rutgers.edu,S=4398:2,S (uid 514 - 721) Dec 24 10:43:08 gehenna11 dovecot: IMAP(user): Maildir /rci/u1/user/Maildir: Expunged message reappeared, giving a new UID (old uid=516, file=1231952516.M164875P24494V03E80009I01B6DCAC_0.gehenna9.rutgers.edu,S=4355:2,S) Dec 24 10:44:38 gehenna11 dovecot: IMAP(user): /rci/nqu/rci/u1/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 6: 1233524267.M97899P9543V04240006I01B6E908_0.gehenna5.rutgers.edu,S=3039:2,S (uid 517 - 723) Wonder if there's a corresponding Expunged message reappeared, giving a new UID (old uid=x) having Dupliate file entry .. (uid x - for each log line? Meaning that the duplicate file entries are caused by those reappearing messages? (And the reappearing messages are probably caused by dentry caching issues.) Dec 24 10:44:38 gehenna11 dovecot: IMAP(user): Panic: file maildir-uidlist.c: line 405 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) I couldn't previously really figure out how this could happen, even when manually causing the expunged message reappeared error. Could you do in gdb something like: fr 6 (or whatever frame gives usable results) p *rec p count p *recs[0] p *recs[1] p *recs[..up until count-1] If count is large, the main things I want to know are: 1) Is the array sorted (recs[n]-uid recs[n+1]-uid always)? 2) is rec-uid anywhere in the recs array? If it complains about values being optimized away, recompile without -O2. I usually do that by just removing it from src/lib-storage/index/maildir/Makefile and touch dovecot-uidlist.c and make + make install. signature.asc Description: This is a digitally signed message part
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
On 12/29/2009 6:18 PM, Timo Sirainen wrote: Wonder if there's a corresponding Expunged message reappeared, giving a new UID (old uid=x) having Dupliate file entry .. (uid x - for each log line? Meaning that the duplicate file entries are caused by those reappearing messages? (And the reappearing messages are probably caused by dentry caching issues.) I'm wondering there are multiple causes going on that all spawn the same error. Is some instances, yes I do see the Duplicate file... preceding the error, in some instances I don't, but I do get the NFS stale message then. The output above just happens to be from during the duplicate instance. Dec 24 10:44:38 gehenna11 dovecot: IMAP(user): Panic: file maildir-uidlist.c: line 405 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) If it complains about values being optimized away, recompile without -O2. I usually do that by just removing it from src/lib-storage/index/maildir/Makefile and touch dovecot-uidlist.c and make + make install. I'll definitely get back to you on this. Right now we're closed until after New Years and I don't want to go updating the dovecot package on all of our servers until we're all back at work. I did do some quick poking around and the count is optimized out, so I'll have the package rebuilt without optimization and let you what the values are at the beginning of next week. Thanks again.
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
On 29.12.2009, at 19.09, David Halik wrote: I'll definitely get back to you on this. Right now we're closed until after New Years and I don't want to go updating the dovecot package on all of our servers until we're all back at work. I did do some quick poking around and the count is optimized out, so I'll have the package rebuilt without optimization and let you what the values are at the beginning of next week. Thanks again. well, you can probably also get the values in a bit more difficult way: p count = p uidlist.records.arr.buffer.used / uidlist.records.arr.element_size p recs[n] = p *(*uidlist.records.v)[n]
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
Dec 23 11:05:44 gehenna18.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) This is hopefully fixed by: http://hg.dovecot.org/dovecot-1.2/rev/d00abdaf9df2 Timo, I came into work this morning and found three new core dumps from maildir_uidlist_records_array_delete while running the patch you gave me yesterday, so I guess it's not fixed after all. I double checked that it is in fact running your patch, and it is. You can tell also that maildir-uidlist.c: line 403 changed to 405. Here's the short backtrace: #0 0x003543030265 in raise (sig=value optimized out) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x003543031d10 in abort () at abort.c:88 #2 0x0049eddd in default_fatal_finish (type=value optimized out, status=0) at failures.c:160 #3 0x0049ee33 in i_internal_fatal_handler (type=LOG_TYPE_PANIC, status=0, fmt=value optimized out, args=value optimized out) at failures.c:443 #4 0x0049e496 in i_panic (format=0x6 Address 0x6 out of bounds) at failures.c:207 #5 0x0043f064 in maildir_uidlist_records_array_delete (uidlist=value optimized out, rec=0x16dbd48) at maildir-uidlist.c:405 #6 0x0043fbf0 in maildir_uidlist_refresh (uidlist=0x16c75b0) at maildir-uidlist.c:552 #7 0x0043c91f in maildir_sync_context (ctx=0x16a7188, forced=false, find_uid=0x0, lost_files_r=0x7fff86f83d87) at maildir-sync.c:659 #8 0x0043cefe in maildir_storage_sync_init (box=0x16b64f8, flags=0) at maildir-sync.c:924 #9 0x004283d7 in imap_sync_init (client=0x16b27a0, box=0x16b64f8, imap_flags=0, flags=0) at imap-sync.c:146 #10 0x0041bf5e in idle_sync_now (box=0x4a58, ctx=0x16b3db8) at cmd-idle.c:127 #11 0x00458ecd in check_timeout (ibox=0x16b64f8) at index-mailbox-check.c:43 #12 0x004a5f6b in io_loop_handle_timeouts (ioloop=0x16afb60) at ioloop.c:313 #13 0x004a6bd3 in io_loop_handler_run (ioloop=0x16afb60) at ioloop-epoll.c:180 #14 0x004a5d6d in io_loop_run (ioloop=0x16afb60) at ioloop.c:335 #15 0x00428f20 in main (argc=value optimized out, argv=value optimized out, envp=0x7fff86f840b8) at main.c:327 And the full backtrace: http://pastebin.com/f651f649e -- David Halik System Administrator OIT-CSS Rutgers University dha...@jla.rutgers.edu
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
I should probably also post the messages leading up for reference. Note that I did not see any stale NFS messages this time, but did get the usual duplicate file messages: Dec 24 10:43:07 gehenna11 dovecot: IMAP(user): /rci/nqu/rci/u1/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 4: 1231949492.M325236P5922V03E80009I01B6DCA5_0.gehenna9.rutgers.edu,S=4398:2,S (uid 514 - 721) Dec 24 10:43:08 gehenna11 dovecot: IMAP(user): Maildir /rci/u1/user/Maildir: Expunged message reappeared, giving a new UID (old uid=516, file=1231952516.M164875P24494V03E80009I01B6DCAC_0.gehenna9.rutgers.edu,S=4355:2,S) Dec 24 10:44:38 gehenna11 dovecot: IMAP(user): /rci/nqu/rci/u1/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 6: 1233524267.M97899P9543V04240006I01B6E908_0.gehenna5.rutgers.edu,S=3039:2,S (uid 517 - 723) Dec 24 10:44:38 gehenna11 dovecot: IMAP(user): Panic: file maildir-uidlist.c: line 405 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 24 10:44:38 gehenna11 dovecot: IMAP(user): Raw backtrace: /usr/libexec/dovecot/imap [0x49edd0] - /usr/libexec/dovecot/imap [0x49ee33] - /usr/libexec/dovecot/imap [0x49e496] - /usr/libexec/dovecot/imap [0x43f064] - /usr/libexec/dovecot/imap(maildir_uidlist_refresh+0x2d0) [0x43fbf0] - /usr/libexec/dovecot/imap [0x43c91f] - /usr/libexec/dovecot/imap(maildir_storage_sync_init+0x14e) [0x43cefe] - /usr/libexec/dovecot/imap(imap_sync_init+0x67) [0x4283d7] - /usr/libexec/dovecot/imap [0x41bf5e] - /usr/libexec/dovecot/imap [0x458ecd] - /usr/libexec/dovecot/imap(io_loop_handle_timeouts+0x8b) [0x4a5f6b] - /usr/libexec/dovecot/imap(io_loop_handler_run+0x73) [0x4a6bd3] - /usr/libexec/dovecot/imap(io_loop_run+0x1d) [0x4a5d6d] - /usr/libexec/dovecot/imap(main+0x620) [0x428f20] - /lib64/libc.so.6(__libc_start_main+0xf4) [0x354301d994] - /usr/libexec/dovecot/imap [0x419ac9] Dec 24 10:44:38 gehenna11 dovecot: dovecot: child 19032 (imap) killed with signal 6 (core dumped) On 12/24/2009 11:26 AM, David Halik wrote: Dec 23 11:05:44 gehenna18.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) This is hopefully fixed by: http://hg.dovecot.org/dovecot-1.2/rev/d00abdaf9df2 Timo, I came into work this morning and found three new core dumps from maildir_uidlist_records_array_delete while running the patch you gave me yesterday, so I guess it's not fixed after all. I double checked that it is in fact running your patch, and it is. You can tell also that maildir-uidlist.c: line 403 changed to 405. Here's the short backtrace: #0 0x003543030265 in raise (sig=value optimized out) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x003543031d10 in abort () at abort.c:88 #2 0x0049eddd in default_fatal_finish (type=value optimized out, status=0) at failures.c:160 #3 0x0049ee33 in i_internal_fatal_handler (type=LOG_TYPE_PANIC, status=0, fmt=value optimized out, args=value optimized out) at failures.c:443 #4 0x0049e496 in i_panic (format=0x6 Address 0x6 out of bounds) at failures.c:207 #5 0x0043f064 in maildir_uidlist_records_array_delete (uidlist=value optimized out, rec=0x16dbd48) at maildir-uidlist.c:405 #6 0x0043fbf0 in maildir_uidlist_refresh (uidlist=0x16c75b0) at maildir-uidlist.c:552 #7 0x0043c91f in maildir_sync_context (ctx=0x16a7188, forced=false, find_uid=0x0, lost_files_r=0x7fff86f83d87) at maildir-sync.c:659 #8 0x0043cefe in maildir_storage_sync_init (box=0x16b64f8, flags=0) at maildir-sync.c:924 #9 0x004283d7 in imap_sync_init (client=0x16b27a0, box=0x16b64f8, imap_flags=0, flags=0) at imap-sync.c:146 #10 0x0041bf5e in idle_sync_now (box=0x4a58, ctx=0x16b3db8) at cmd-idle.c:127 #11 0x00458ecd in check_timeout (ibox=0x16b64f8) at index-mailbox-check.c:43 #12 0x004a5f6b in io_loop_handle_timeouts (ioloop=0x16afb60) at ioloop.c:313 #13 0x004a6bd3 in io_loop_handler_run (ioloop=0x16afb60) at ioloop-epoll.c:180 #14 0x004a5d6d in io_loop_run (ioloop=0x16afb60) at ioloop.c:335 #15 0x00428f20 in main (argc=value optimized out, argv=value optimized out, envp=0x7fff86f840b8) at main.c:327 And the full backtrace: http://pastebin.com/f651f649e -- David Halik System Administrator OIT-CSS Rutgers University dha...@jla.rutgers.edu
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
I switched all of our servers to dotlock_use_excl=no last night, but we're still seeing the errors: Dec 23 08:22:04 gehenna17.rutgers.edu dovecot: imap-login: Login: user=user1, method=PLAIN, rip=xxx.xxx.xxx.xxx, lip=xxx.xxx.xxx.xxx, TLS Dec 23 08:36:21 gehenna11.rutgers.edu dovecot: imap-login: Login: user=user1, method=PLAIN, rip=xxx.xxx.xxx.xxx, lip=xxx.xxx.xxx.xxx, TLS Dec 23 09:17:09 gehenna18.rutgers.edu dovecot: imap-login: Login: user=user1, method=PLAIN, rip=xxx.xxx.xxx.xxx, lip=xxx.xxx.xxx.xxx, TLS Dec 23 10:59:37 gehenna11.rutgers.edu dovecot: IMAP(user1): fdatasync(/rci/nqu/rci/u1/user1/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Dec 23 10:59:37 gehenna11.rutgers.edu dovecot: IMAP(user1): /rci/nqu/rci/u1/user1/dovecot/.INBOX/dovecot-uidlist: next_uid was lowered (512 - 511, hdr=511) Dec 23 11:05:43 gehenna11.rutgers.edu dovecot: IMAP(user1): Maildir /rci/u1/user1/Maildir: Expunged message reappeared, giving a new UID (old uid=1, file=1231949492.M325236P5922V03E80009I01B6DCA5_0.gehenna9.rutgers.edu,S=4398:2,S) Dec 23 11:05:44 gehenna18.rutgers.edu dovecot: IMAP(user1): /rci/nqu/rci/u1/user1/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 4: 1231949492.M325236P5922V03E80009I01B6DCA5_0.gehenna9.rutgers.edu,S=4398:2,S (uid 1 - 514) Dec 23 11:05:44 gehenna18.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) It must be related to locking though because we can see that the user is logged in on more than one server and the issues occur when the two dovecot instances fight over the uidlist. The second error I'm seeing much more of though. A few users are spawning these errors on almost every connection. This doesn't appear to be locking related, since the user is logged in on only one server as far as I can tell and there are also no NFS or duplicate file errors: Dec 23 11:18:32 gehenna17.rutgers.edu dovecot: imap-login: Login: user=user2, method=PLAIN, rip=xxx.xxx.xxx.xxx, lip=xxx.xxx.xxx.xxx, TLS Dec 23 11:18:32 gehenna17.rutgers.edu dovecot: IMAP(user2): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 23 11:18:32 gehenna17.rutgers.edu dovecot: IMAP(user2): Raw backtrace: /usr/libexec/dovecot/imap [0x49ed50] - /usr/libexec/dovecot/imap [0x49edb3] - /usr/libexec/dovecot/imap [0x49e416] - /usr/libexec/dovecot/imap [0x41c7c0] - /us r/libexec/dovecot/imap [0x41c9c2] - /usr/libexec/dovecot/imap(cmd_list_full+0x49a) [0x41d60a] - /usr/libexec/dovecot/imap(cmd_list+0xb) [0x41d8ab] - /usr/libexec/dovecot/imap [0x4208cc] - /usr/libexec/dovecot/imap [0x420982] - /usr/ libexec/dovecot/imap(client_handle_input+0x3f) [0x420abf] - /usr/libexec/dovecot/imap(client_input+0x5f) [0x42160f] - /usr/libexec/dovecot/imap(io_loop_handler_run+0xf8) [0x4a6bd8] - /usr/libexec/dovecot/imap(io_loop_run+0x1d) [0x4a5c ed] - /usr/libexec/dovecot/imap(main+0x620) [0x428ef0] - /lib64/libc.so.6(__libc_start_main+0xf4) [0x38af41d994] - /usr/libexec/dovecot/imap [0x419ac9] Dec 23 11:18:33 gehenna17.rutgers.edu dovecot: imap-login: Login: user=user2, method=PLAIN, rip=xxx.xxx.xxx.xxx, lip=xxx.xxx.xxx.xxx, TLS Dec 23 11:18:33 gehenna17.rutgers.edu dovecot: IMAP(user2): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 23 11:18:33 gehenna17.rutgers.edu dovecot: IMAP(user2): Raw backtrace: /usr/libexec/dovecot/imap [0x49ed50] - /usr/libexec/dovecot/imap [0x49edb3] - /usr/libexec/dovecot/imap [0x49e416] - /usr/libexec/dovecot/imap [0x41c7c0] - /us r/libexec/dovecot/imap [0x41c9c2] - /usr/libexec/dovecot/imap(cmd_list_full+0x49a) [0x41d60a] - /usr/libexec/dovecot/imap(cmd_list+0xb) [0x41d8ab] - /usr/libexec/dovecot/imap [0x4208cc] - /usr/libexec/dovecot/imap [0x420982] - /usr/ libexec/dovecot/imap(client_handle_input+0x3f) [0x420abf] - /usr/libexec/dovecot/imap(client_input+0x5f) [0x42160f] - /usr/libexec/dovecot/imap(io_loop_handler_run+0xf8) [0x4a6bd8] - /usr/libexec/dovecot/imap(io_loop_run+0x1d) [0x4a5c ed] - /usr/libexec/dovecot/imap(main+0x620) [0x428ef0] - /lib64/libc.so.6(__libc_start_main+0xf4) [0x38af41d994] - /usr/libexec/dovecot/imap [0x419ac9] On 12/22/2009 08:17 PM, Timo Sirainen wrote: On 22.12.2009, at 16.42, David Halik wrote: I just double checked and we're seeing the same thing here, although we were never running with noac to begin with. Our poor NFS server would melt probably. ;) This is immediately before the crash: Dec 22 13:09:20 gehenna14.rutgers.edu dovecot: IMAP(user): fdatasync(/rci/nqu/rci/u5/bruben/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle The interesting this is that this function is called only when dovecot-uidlist.lock has been created, i.e. when uidlist is locked, meaning nothing should have deleted the dovecot-uidlist while it was being written to, so this error just
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
On Wed, 2009-12-23 at 11:37 -0500, David Halik wrote: Dec 23 11:05:43 gehenna11.rutgers.edu dovecot: IMAP(user1): Maildir /rci/u1/user1/Maildir: Expunged message reappeared, giving a new UID (old uid=1, file=1231949492.M325236P5922V03E80009I01B6DCA5_0.gehenna9.rutgers.edu,S=4398:2,S) The above causes: Dec 23 11:05:44 gehenna18.rutgers.edu dovecot: IMAP(user1): /rci/nqu/rci/u1/user1/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 4: 1231949492.M325236P5922V03E80009I01B6DCA5_0.gehenna9.rutgers.edu,S=4398:2,S (uid 1 - 514) Although in my tests it doesn't.. Dec 23 11:05:44 gehenna18.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) This is hopefully fixed by: http://hg.dovecot.org/dovecot-1.2/rev/d00abdaf9df2 Dec 23 11:18:32 gehenna17.rutgers.edu dovecot: IMAP(user2): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) So you get a core dump? Could you print a few values: fr 6 (or whatever starts giving the right values) p *ctx p *ctx.ns p ctx.patterns[0] p ctx.patterns[1] p ctx.patterns[2] p match p ns_prefix p p The ctx.patterns values are the most important ones I think. signature.asc Description: This is a digitally signed message part
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
Timo, On 12/23/09 8:37 AM, David Halik dha...@jla.rutgers.edu wrote: I switched all of our servers to dotlock_use_excl=no last night, but we're still seeing the errors: We too have set dotlock_use_excl = no. I'm not seeing the Stale NFS file handle message any more, but I am still seeing a crash. The crashes seem to be leaving the indexes in a bad state: Dec 23 09:07:44 oh-popmap3p dovecot: imap: user=cnisser, rip=x.x.x.x, pid=30101: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 23 09:07:44 oh-popmap3p dovecot: imap: user=cnisser, rip= x.x.x.x, pid=30101: Raw backtrace: imap [0x4d8986] - imap [0x4d97b0] - imap(i_fatal+0) [0x4d8c7a] - imap [0x44f2cc] - imap [0x44f814] - imap [0x4500a2] - imap(maildir_uidlist_refresh+0x9d) [0x450686] - imap [0x44bff1] - imap [0x44c0a8] - imap [0x44c178] - imap(maildir_storage_sync_init+0x7c) [0x44c6e6] - imap(mailbox_sync_init+0x44) [0x489922] - imap(imap_sync_init+0xab) [0x42e02b] - imap [0x41ccc4] - imap [0x41cd26] - imap [0x4733be] - imap [0x4e4171] - imap(io_loop_handle_timeouts+0x1d) [0x4e41ce] - imap(io_loop_handler_run+0x86) [0x4e4f29] - imap(io_loop_run+0x3b) [0x4e4214] - imap(main+0xa6) [0x4300af] - /lib64/libc.so.6(__libc_start_main+0xf4) [0x3217e1d994] - imap [0x419aa9] Dec 23 09:07:45 oh-popmap3p dovecot: dovecot: child 30101 (imap) killed with signal 6 (core dumped) Dec 23 09:09:16 cc-popmap3p dovecot: imap: user=cnisser, rip= x.x.x.x, pid=5975: Corrupted index cache file /home16/cnisser/.imapidx/.INBOX/dovecot.index.cache: invalid record size Dec 23 09:09:17 oh-popmap2p dovecot: imap: user=cnisser, rip=y.y.y.y, pid=3279: read() failed with index cache file /home16/cnisser/.imapidx/.INBOX/dovecot.index.cache: Input/output error Dec 23 09:09:38 cc-popmap3p dovecot: imap: user=cnisser, rip= x.x.x.x, pid=5975: Corrupted index cache file /home16/cnisser/.imapidx/.INBOX/dovecot.index.cache: invalid record size Dec 23 09:18:12 cc-popmap3p dovecot: imap: user=cnisser, rip= x.x.x.x, pid=5975: Corrupted index cache file /home16/cnisser/.imapidx/.INBOX/dovecot.index.cache: invalid record size We're also seeing another odd error that seems to be unrelated to the crashes, but seemed like it bears reporting. Reading of uidlists and cache files seems to intermittently fail with EIO. It doesn't seem to tie in with anything else, and I don't see any corresponding NFS errors in the system log. Dec 23 09:31:06 oh-popmap4p dovecot: imap: user=joet, rip=a.a.a.a, pid=7641: read(/home6/joet/Maildir/dovecot-uidlist) failed: Input/output error Dec 23 09:53:17 cc-popmap2p dovecot: imap: user=catm, rip=b.b.b.b, pid=12840: read(/home3/catm/Maildir/dovecot-uidlist) failed: Input/output error Dec 23 09:59:38 cc-popmap5p dovecot: imap: user=kforrist, rip=c.c.c.c, pid=13539: read() failed with index cache file /home15/kforrist/.imapidx/.INBOX/dovecot.index.cache: Input/output error -Brad
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
Dec 23 11:05:44 gehenna18.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) This is hopefully fixed by: http://hg.dovecot.org/dovecot-1.2/rev/d00abdaf9df2 Thanks for the help as always. I'm currently patching 1.2.9 with this and pushing the changes out to our dovecot servers. I'll update you later today if we stop seeing the panic. Dec 23 11:18:32 gehenna17.rutgers.edu dovecot: IMAP(user2): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) So you get a core dump? Could you print a few values: fr 6 (or whatever starts giving the right values) p *ctx p *ctx.ns p ctx.patterns[0] p ctx.patterns[1] p ctx.patterns[2] p match p ns_prefix p p The ctx.patterns values are the most important ones I think. The values from lines 5-7 all look the the same. Here's the answers from line 6 though: (gdb) fr 6 #6 0x0041c9c2 in cmd_list_continue (cmd=0x1212dcc8) at cmd-list.c:375 375list_namespace_send_prefix(ctx, TRUE); (gdb) p *ctx $17 = {cmd = 0x1212dcc8, ref = 0x12132b50 , patterns = 0x1212de48, list_flags = 16386, status_items = 0, ns = 0x1212bf70, list_iter = 0x121304f8, ns_prefixes_listed = {arr = {buffer = 0x1212de58, element_size = 8}, v = 0x1212de58, v_modifiable = 0x1212de58}, lsub = 0, lsub_no_unsubscribed = 0, inbox_found = 1, seen_inbox_namespace = 1, cur_ns_match_inbox = 1, cur_ns_send_prefix = 0, cur_ns_skip_trailing_sep = 1, used_listext = 0} (gdb) p *ctx.ns $18 = {next = 0x0, type = NAMESPACE_PRIVATE, sep = 46 '.', real_sep = 46 '.', sep_str = .\000, flags = 8213, prefix = 0x1212bfd0 INBOX., prefix_len = 6, alias_for = 0x0, alias_chain_next = 0x0, user = 0x1212b988, owner = 0x1212b988, list = 0x1212c348, 0x1212c038} (gdb) p ctx.patterns[0] $19 = 0x12132b58 Inbox (gdb) p ctx.patterns[1] $20 = 0x0 (gdb) p ctx.patterns[2] $21 = 0x1212de90 (gdb) p match No symbol match in current context. (gdb) p ns_prefix No symbol ns_prefix in current context. (gdb) p p No symbol p in current context. ...and here's the full trace for reference http://pastebin.com/f77189785 -- David Halik System Administrator OIT-CSS Rutgers University dha...@jla.rutgers.edu
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
On Wed, 2009-12-23 at 14:06 -0500, David Halik wrote: Dec 23 11:18:32 gehenna17.rutgers.edu dovecot: IMAP(user2): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Fixed: http://hg.dovecot.org/dovecot-1.2/rev/56dd8c276ed6 signature.asc Description: This is a digitally signed message part
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
It looks like this and the previous patch you gave me fixed both panics as intended. I pushed the changes out about five hours ago and haven't seen a single core dump yet! Thanks for such a quick fix Timo, you rock as always. If I see any change in behavior I'll post, but I think that did it. I have one or two squat panics that I'm going to send in after the holidays, but they're much less of a concern since we don't run them on our production systems yet and they're not that big of a deal. Thanks again. On 12/23/2009 4:01 PM, Timo Sirainen wrote: On Wed, 2009-12-23 at 14:06 -0500, David Halik wrote: Dec 23 11:18:32 gehenna17.rutgers.edu dovecot: IMAP(user2): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Fixed: http://hg.dovecot.org/dovecot-1.2/rev/56dd8c276ed6
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
I'm seeing both of these dumps on multiple users now with 1.2.9, so I went ahead and did backtraces for them both. maildir_uidlist_records_array_delete panic: http://pastebin.com/f20614d8 ns_get_listed_prefix panic: http://pastebin.com/f1420194c On 12/21/2009 12:43 PM, David Halik wrote: Just wanted to update you that I just upgraded all of our servers to 1.2.9 and I'm still seeing the array_delete panic: Dec 21 12:10:16 gehenna11.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 21 12:15:12 gehenna19.rutgers.edu dovecot: IMAP(user2): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) I also started receiving a good deal of these, but only from one user so far: Dec 21 12:16:42 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:19:57 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Let me know if you need full backtraces from the core dump. On 12/17/2009 02:06 PM, David Halik wrote: On 12/17/2009 01:07 PM, Timo Sirainen wrote: On Thu, 2009-12-17 at 12:49 -0500, David Halik wrote: I applied those patches to my 1.2.8 installation before 1.2.9 was released and that immediately fixed the expunge crash, but the array_delete bug is still there. Do you also see the duplicate file entry before the crash? Yes, the duplicate file entry is always reported immediately before the crash, just as Ralf reported too. You can see it in this example pastebin I took from one of our users: http://pastebin.com/f29c55de5 maildir:~/Maildir:INDEX=/rci/nqu%h/dovecot:CONTROL=/rci/nqu%h/dovecot Are the index/control files on NFS? Are there multiple different servers accessing mail data? Correct. All index, control files, amd mail storage are located on NFS and there are multiple load balanced servers accessing the NFS data. We're currently running with: mmap_disable = yes dotlock_use_excl = yes fsync_disable = no mail_nfs_storage = yes mail_nfs_index = yes -- David Halik System Administrator OIT-CSS Rutgers University dha...@jla.rutgers.edu
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
We've started seeing the maildir_uidlist_records_array_delete assert crash as well. It always seems to be preceded by a 'stale NFS file handle' error from a the same user on a different connection. Dec 22 10:12:20 oh-popmap5p dovecot: imap: user=apbao, rip=a.a.a.a, pid=2439: fdatasync(/home11/apbao/Maildir/dovecot-uidlist) failed: Stale NFS file handle Dec 22 10:12:20 oh-popmap5p dovecot: imap: user=apbao, rip=a.a.a.a, pid=2439: /home11/apbao/Maildir/dovecot-uidlist: next_uid was lowered (2642 - 2641, hdr=2641) Dec 22 11:17:26 cc-popmap2p dovecot: imap: user=apbao, rip=b.b.b.b, pid=28088: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 22 11:17:26 cc-popmap2p dovecot: imap: user=apbao, rip=b.b.b.b, pid=28088: Raw backtrace: imap [0x4d8986] - imap [0x4d97b0] - imap(i_fatal+0) [0x4d8c7a] - imap [0x44f2cc] - imap [0x44f814] - imap [0x4500a2] - imap(maildir_uidlist_refresh+0x9d) [0x450686] - imap [0x44bff1] - imap [0x44c0a8] - imap [0x44c178] - imap(maildir_storage_sync_init+0x7c) [0x44c6e6] - imap(mailbox_sync_init+0x44) [0x489922] - imap(imap_sync_init+0xab) [0x42e02b] - imap [0x42f107] - imap(cmd_sync_delayed+0x1c6) [0x42f663] - imap(client_handle_input+0x119) [0x4244d4] - imap(client_input+0xb4) [0x424594] - imap(io_loop_handler_run+0x17d) [0x4e5020] - imap(io_loop_run+0x3b) [0x4e4214] - imap(main+0xa6) [0x4300af] - /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c4ea1d994] - imap [0x419aa9] Dec 22 11:17:26 cc-popmap2p dovecot: dovecot: child 28088 (imap) killed with signal 6 (core dumped) Dec 22 13:16:49 cc-popmap3p dovecot: imap: user=ndunn, rip=x.x.x.x, pid=3908: fdatasync(/home2/ndunn/Maildir/dovecot-uidlist) failed: Stale NFS file handle Dec 22 13:25:16 cc-popmap3p dovecot: imap: user=ndunn, rip=y.y.y.y, pid=3228: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 22 13:25:16 cc-popmap3p dovecot: imap: user=ndunn, rip=y.y.y.y, pid=3228: Raw backtrace: imap [0x4d8986] - imap [0x4d97b0] - imap(i_fatal+0) [0x4d8c7a] - imap [0x44f2cc] - imap [0x44f814] - imap [0x4500a2] - imap(maildir_uidlist_refresh+0x9d) [0x450686] - imap [0x44bff1] - imap [0x44c0a8] - imap [0x44c178] - imap(maildir_storage_sync_init+0x7c) [0x44c6e6] - imap(mailbox_sync_init+0x44) [0x489922] - imap(imap_sync_init+0xab) [0x42e02b] - imap [0x42f107] - imap(cmd_sync_delayed+0x1c6) [0x42f663] - imap(client_handle_input+0x119) [0x4244d4] - imap(client_input+0xb4) [0x424594] - imap(io_loop_handler_run+0x17d) [0x4e5020] - imap(io_loop_run+0x3b) [0x4e4214] - imap(main+0xa6) [0x4300af] - /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e5021d994] - imap [0x419aa9] Dec 22 13:25:16 cc-popmap3p dovecot: dovecot: child 3228 (imap) killed with signal 6 (core dumped) I will note that we did not start seeing this crash until we took 'noac' out of our NFS mount options, as discussed on this list late last week. On the other hand, load on our NFS server (as measured in IOPS/sec) has dropped by a factor of 10. -Brad -Original Message- From: dovecot-bounces+brandond=uoregon@dovecot.org [mailto:dovecot- bounces+brandond=uoregon@dovecot.org] On Behalf Of David Halik Sent: Tuesday, December 22, 2009 7:48 AM To: dovecot@dovecot.org Subject: Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace) I'm seeing both of these dumps on multiple users now with 1.2.9, so I went ahead and did backtraces for them both. maildir_uidlist_records_array_delete panic: http://pastebin.com/f20614d8 ns_get_listed_prefix panic: http://pastebin.com/f1420194c On 12/21/2009 12:43 PM, David Halik wrote: Just wanted to update you that I just upgraded all of our servers to 1.2.9 and I'm still seeing the array_delete panic: Dec 21 12:10:16 gehenna11.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 21 12:15:12 gehenna19.rutgers.edu dovecot: IMAP(user2): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) I also started receiving a good deal of these, but only from one user so far: Dec 21 12:16:42 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:19:57 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Let me know if you need full backtraces from the core
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
I just double checked and we're seeing the same thing here, although we were never running with noac to begin with. Our poor NFS server would melt probably. ;) This is immediately before the crash: Dec 22 13:09:20 gehenna14.rutgers.edu dovecot: IMAP(user): fdatasync(/rci/nqu/rci/u5/bruben/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Dec 22 13:09:20 gehenna14.rutgers.edu dovecot: IMAP(user): /rci/nqu/rci/u5/bruben/dovecot/.INBOX/dovecot-uidlist: next_uid was lowered (1507 - 1506, hdr=1506) Dec 22 13:12:06 gehenna14.rutgers.edu dovecot: IMAP(user): /rci/nqu/rci/u5/bruben/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 3: 1249348137.M369064P26267V04240006I01ED7E09_0.gehenna5.rutgers.edu,S=15857:2,S (uid 1208 - 1508) On 12/22/2009 04:33 PM, Brandon Davidson wrote: We've started seeing the maildir_uidlist_records_array_delete assert crash as well. It always seems to be preceded by a 'stale NFS file handle' error from a the same user on a different connection. Dec 22 10:12:20 oh-popmap5p dovecot: imap: user=apbao, rip=a.a.a.a, pid=2439: fdatasync(/home11/apbao/Maildir/dovecot-uidlist) failed: Stale NFS file handle Dec 22 10:12:20 oh-popmap5p dovecot: imap: user=apbao, rip=a.a.a.a, pid=2439: /home11/apbao/Maildir/dovecot-uidlist: next_uid was lowered (2642 - 2641, hdr=2641) Dec 22 11:17:26 cc-popmap2p dovecot: imap: user=apbao, rip=b.b.b.b, pid=28088: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 22 11:17:26 cc-popmap2p dovecot: imap: user=apbao, rip=b.b.b.b, pid=28088: Raw backtrace: imap [0x4d8986] - imap [0x4d97b0] - imap(i_fatal+0) [0x4d8c7a] - imap [0x44f2cc] - imap [0x44f814] - imap [0x4500a2] - imap(maildir_uidlist_refresh+0x9d) [0x450686] - imap [0x44bff1] - imap [0x44c0a8] - imap [0x44c178] - imap(maildir_storage_sync_init+0x7c) [0x44c6e6] - imap(mailbox_sync_init+0x44) [0x489922] - imap(imap_sync_init+0xab) [0x42e02b] - imap [0x42f107] - imap(cmd_sync_delayed+0x1c6) [0x42f663] - imap(client_handle_input+0x119) [0x4244d4] - imap(client_input+0xb4) [0x424594] - imap(io_loop_handler_run+0x17d) [0x4e5020] - imap(io_loop_run+0x3b) [0x4e4214] - imap(main+0xa6) [0x4300af] - /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c4ea1d994] - imap [0x419aa9] Dec 22 11:17:26 cc-popmap2p dovecot: dovecot: child 28088 (imap) killed with signal 6 (core dumped) Dec 22 13:16:49 cc-popmap3p dovecot: imap: user=ndunn, rip=x.x.x.x, pid=3908: fdatasync(/home2/ndunn/Maildir/dovecot-uidlist) failed: Stale NFS file handle Dec 22 13:25:16 cc-popmap3p dovecot: imap: user=ndunn, rip=y.y.y.y, pid=3228: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 22 13:25:16 cc-popmap3p dovecot: imap: user=ndunn, rip=y.y.y.y, pid=3228: Raw backtrace: imap [0x4d8986] - imap [0x4d97b0] - imap(i_fatal+0) [0x4d8c7a] - imap [0x44f2cc] - imap [0x44f814] - imap [0x4500a2] - imap(maildir_uidlist_refresh+0x9d) [0x450686] - imap [0x44bff1] - imap [0x44c0a8] - imap [0x44c178] - imap(maildir_storage_sync_init+0x7c) [0x44c6e6] - imap(mailbox_sync_init+0x44) [0x489922] - imap(imap_sync_init+0xab) [0x42e02b] - imap [0x42f107] - imap(cmd_sync_delayed+0x1c6) [0x42f663] - imap(client_handle_input+0x119) [0x4244d4] - imap(client_input+0xb4) [0x424594] - imap(io_loop_handler_run+0x17d) [0x4e5020] - imap(io_loop_run+0x3b) [0x4e4214] - imap(main+0xa6) [0x4300af] - /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e5021d994] - imap [0x419aa9] Dec 22 13:25:16 cc-popmap3p dovecot: dovecot: child 3228 (imap) killed with signal 6 (core dumped) I will note that we did not start seeing this crash until we took 'noac' out of our NFS mount options, as discussed on this list late last week. On the other hand, load on our NFS server (as measured in IOPS/sec) has dropped by a factor of 10. -Brad -Original Message- From: dovecot-bounces+brandond=uoregon@dovecot.org [mailto:dovecot- bounces+brandond=uoregon@dovecot.org] On Behalf Of David Halik Sent: Tuesday, December 22, 2009 7:48 AM To: dovecot@dovecot.org Subject: Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace) I'm seeing both of these dumps on multiple users now with 1.2.9, so I went ahead and did backtraces for them both. maildir_uidlist_records_array_delete panic: http://pastebin.com/f20614d8 ns_get_listed_prefix panic: http://pastebin.com/f1420194c On 12/21/2009 12:43 PM, David Halik wrote: Just wanted to update you that I just upgraded all of our servers to 1.2.9 and I'm still seeing the array_delete panic: Dec 21 12:10:16 gehenna11.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 21 12:15:12 gehenna19.rutgers.edu dovecot: IMAP(user2): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
On 22.12.2009, at 16.42, David Halik wrote: I just double checked and we're seeing the same thing here, although we were never running with noac to begin with. Our poor NFS server would melt probably. ;) This is immediately before the crash: Dec 22 13:09:20 gehenna14.rutgers.edu dovecot: IMAP(user): fdatasync(/rci/nqu/rci/u5/bruben/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle The interesting this is that this function is called only when dovecot-uidlist.lock has been created, i.e. when uidlist is locked, meaning nothing should have deleted the dovecot-uidlist while it was being written to, so this error just shouldn't happen.. So apparently the dotlocking just isn't working properly for your NFS servers. See if dotlock_use_excl=no helps?
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
Just wanted to update you that I just upgraded all of our servers to 1.2.9 and I'm still seeing the array_delete panic: Dec 21 12:10:16 gehenna11.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 21 12:15:12 gehenna19.rutgers.edu dovecot: IMAP(user2): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) I also started receiving a good deal of these, but only from one user so far: Dec 21 12:16:42 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:19:57 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Let me know if you need full backtraces from the core dump. On 12/17/2009 02:06 PM, David Halik wrote: On 12/17/2009 01:07 PM, Timo Sirainen wrote: On Thu, 2009-12-17 at 12:49 -0500, David Halik wrote: I applied those patches to my 1.2.8 installation before 1.2.9 was released and that immediately fixed the expunge crash, but the array_delete bug is still there. Do you also see the duplicate file entry before the crash? Yes, the duplicate file entry is always reported immediately before the crash, just as Ralf reported too. You can see it in this example pastebin I took from one of our users: http://pastebin.com/f29c55de5 maildir:~/Maildir:INDEX=/rci/nqu%h/dovecot:CONTROL=/rci/nqu%h/dovecot Are the index/control files on NFS? Are there multiple different servers accessing mail data? Correct. All index, control files, amd mail storage are located on NFS and there are multiple load balanced servers accessing the NFS data. We're currently running with: mmap_disable = yes dotlock_use_excl = yes fsync_disable = no mail_nfs_storage = yes mail_nfs_index = yes -- David Halik System Administrator OIT-CSS Rutgers University dha...@jla.rutgers.edu
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
I found this post from a couple of weeks ago: http://www.mail-archive.com/dovecot@dovecot.org/msg24127.html Any update on this? I'm experiencing the same issue. Out of a user population of 4000, I get about 20 of these dumps a day, so it's not major issue, but I thought I'd follow up. Here is an example: http://pastebin.com/f29c55de5 I can provide a more detailed debug log from teh core dump if necessary, but since the above email already had it I figured I'd wait. Note this is *not* the same issue as: - maildir: v1.2.7 and v1.2.8 caused assert-crashes in maildir_uidlist_records_drop_expunges() I applied those patches to my 1.2.8 installation before 1.2.9 was released and that immediately fixed the expunge crash, but the array_delete bug is still there. dovecot -n --- bash-3.2# /usr/sbin/dovecot -n # 1.2.8: /etc/dovecot.conf # OS: Linux 2.6.18-164.6.1.el5 x86_64 CentOS release 5.4 (Final) listen: * ssl_cert_file: /rci/local/certs/imapd-pop3d.pem ssl_key_file: /rci/local/certs/imapd-pop3d.pem disable_plaintext_auth: yes login_dir: /var/run/dovecot/login login_executable(default): /usr/libexec/dovecot/imap-login login_executable(imap): /usr/libexec/dovecot/imap-login login_executable(pop3): /usr/libexec/dovecot/pop3-login login_max_processes_count: 2048 max_mail_processes: 2048 first_valid_uid: 100 mail_location: maildir:~/Maildir:INDEX=/rci/nqu%h/dovecot:CONTROL=/rci/nqu%h/dovecot mmap_disable: yes mail_nfs_storage: yes mail_nfs_index: yes mail_drop_priv_before_exec: yes mail_executable(default): /rci/local/etc/convert-courier-imap.sh mail_executable(imap): /rci/local/etc/convert-courier-imap.sh mail_executable(pop3): /rci/local/etc/convert-courier-pop.sh mail_plugin_dir(default): /usr/lib64/dovecot/imap mail_plugin_dir(imap): /usr/lib64/dovecot/imap mail_plugin_dir(pop3): /usr/lib64/dovecot/pop3 pop3_uidl_format(default): %08Xu%08Xv pop3_uidl_format(imap): %08Xu%08Xv pop3_uidl_format(pop3): UID%u-%v namespace: type: private separator: . prefix: INBOX. inbox: yes list: yes subscriptions: yes auth default: passdb: driver: pam args: dovecot userdb: driver: passwd -- David Halik System Administrator OIT-CSS Rutgers University dha...@jla.rutgers.edu
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
On Thu, 2009-12-17 at 12:49 -0500, David Halik wrote: I applied those patches to my 1.2.8 installation before 1.2.9 was released and that immediately fixed the expunge crash, but the array_delete bug is still there. Do you also see the duplicate file entry before the crash? maildir:~/Maildir:INDEX=/rci/nqu%h/dovecot:CONTROL=/rci/nqu%h/dovecot Are the index/control files on NFS? Are there multiple different servers accessing mail data? signature.asc Description: This is a digitally signed message part
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
On 12/17/2009 01:07 PM, Timo Sirainen wrote: On Thu, 2009-12-17 at 12:49 -0500, David Halik wrote: I applied those patches to my 1.2.8 installation before 1.2.9 was released and that immediately fixed the expunge crash, but the array_delete bug is still there. Do you also see the duplicate file entry before the crash? Yes, the duplicate file entry is always reported immediately before the crash, just as Ralf reported too. You can see it in this example pastebin I took from one of our users: http://pastebin.com/f29c55de5 maildir:~/Maildir:INDEX=/rci/nqu%h/dovecot:CONTROL=/rci/nqu%h/dovecot Are the index/control files on NFS? Are there multiple different servers accessing mail data? Correct. All index, control files, amd mail storage are located on NFS and there are multiple load balanced servers accessing the NFS data. We're currently running with: mmap_disable = yes dotlock_use_excl = yes fsync_disable = no mail_nfs_storage = yes mail_nfs_index = yes -- David Halik System Administrator OIT-CSS Rutgers University dha...@jla.rutgers.edu
[Dovecot] dovecot-1.2.8 imap crash (with backtrace)
In the log: Dec 1 17:24:00 postamt dovecot: IMAP(scx): /home/s/c/scx/Maildir/dovecot-uidlist: Duplicate file entry at line 1: 1259679049.M329485P1617.postamt.charite.de,S=142073,W=143959:2,Sb (uid 3157 - 3159) Dec 1 17:24:00 postamt dovecot: IMAP(scx): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 1 17:24:00 postamt dovecot: IMAP(scx): Raw backtrace: imap [0x80f13d1] - imap [0x80f1442] - imap [0x80f0db9] - imap [0x8088e76] - imap(maildir_uidlist_refresh+0x9e9) [0x8089e99] - imap [0x80865d4] - imap(maildir_storage_sync_init+0x125) [0x8086ab5] - imap(imap_sync_init+0x54) [0x8070304] - imap [0x8062aa2] - imap [0x80a604c] - imap(io_loop_handle_timeouts+0xe9) [0x80f96f9] - imap(io_loop_handler_run+0x82) [0x80f9fc2] - imap(io_loop_run+0x20) [0x80f9490] - imap(main+0x5d2) [0x8070e22] - /lib/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7f17b35] - imap [0x80602d1] The backtrace: GNU gdb (GDB) 7.0-debian Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as i486-linux-gnu. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/local/libexec/dovecot/imap...done. Reading symbols from /lib/i686/cmov/libdl.so.2...Reading symbols from /usr/lib/debug/lib/i686/cmov/libdl-2.10.1.so...done. (no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/libdl.so.2 Reading symbols from /lib/i686/cmov/librt.so.1...Reading symbols from /usr/lib/debug/lib/i686/cmov/librt-2.10.1.so...done. (no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/librt.so.1 Reading symbols from /lib/i686/cmov/libc.so.6...Reading symbols from /usr/lib/debug/lib/i686/cmov/libc-2.10.1.so...done. (no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/libc.so.6 Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug/lib/ld-2.10.1.so...done. (no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/i686/cmov/libpthread.so.0...Reading symbols from /usr/lib/debug/lib/i686/cmov/libpthread-2.10.1.so...done. (no debugging symbols found)...done. Loaded symbols for /lib/i686/cmov/libpthread.so.0 Reading symbols from /usr/local/lib/dovecot/imap/lib10_quota_plugin.so...done. Loaded symbols for /usr/local/lib/dovecot/imap/lib10_quota_plugin.so Reading symbols from /usr/local/lib/dovecot/imap/lib11_imap_quota_plugin.so...done. Loaded symbols for /usr/local/lib/dovecot/imap/lib11_imap_quota_plugin.so Reading symbols from /usr/local/lib/dovecot/imap/lib11_trash_plugin.so...done. Loaded symbols for /usr/local/lib/dovecot/imap/lib11_trash_plugin.so Reading symbols from /usr/local/lib/dovecot/imap/lib20_autocreate_plugin.so...done. Loaded symbols for /usr/local/lib/dovecot/imap/lib20_autocreate_plugin.so Reading symbols from /usr/local/lib/dovecot/imap/lib20_fts_plugin.so...done. Loaded symbols for /usr/local/lib/dovecot/imap/lib20_fts_plugin.so Reading symbols from /usr/local/lib/dovecot/imap/lib20_mail_log_plugin.so...done. Loaded symbols for /usr/local/lib/dovecot/imap/lib20_mail_log_plugin.so Reading symbols from /usr/local/lib/dovecot/imap/lib20_zlib_plugin.so...done. Loaded symbols for /usr/local/lib/dovecot/imap/lib20_zlib_plugin.so Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libbz2.so.1.0...(no debugging symbols found)...done. Loaded symbols for /lib/libbz2.so.1.0 Reading symbols from /usr/local/lib/dovecot/imap/lib21_fts_squat_plugin.so...done. Loaded symbols for /usr/local/lib/dovecot/imap/lib21_fts_squat_plugin.so Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libgcc_s.so.1 Core was generated by `imap [scx 88.73.137.113]'. Program terminated with signal 6, Aborted. #0 0xb8064424 in __kernel_vsyscall () #0 0xb8064424 in __kernel_vsyscall () No symbol table info available. #1 0xb7f2b8c0 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 resultvar = value optimized out pid = -1207676940 selftid = 7704 #2 0xb7f2edf5 in *__GI_abort () at abort.c:88 act = {__sigaction_handler = {sa_handler = 0x8723738, sa_sigaction = 0x8723738}, sa_mask = {__val = {3216510312, 135196426, 3087288032, 0, 3216510432, 3216510420, 128, 0, 0, 3216510496, 3087541872, 134566866, 549, 0, 0, 0, 1, 135269694, 128, 1, 3216510420, 141702936, 69, 3216510548, 3216510392, 135198022, 141703517, 3216510420, 3216510424, 550, 141702968, 3086750483}}, sa_flags = 550,