I am aware of the problem, but only recently that it happened on other systems than Solaris. I am now aware that it happens on FreeBSD and Linux.

I am astounded by the fact that the first three bytes of the corruption always seem to be <CTRL/W><CTRL/C><CTRL/A>, 0x17 0x03 0x01, even on different platforms.

I have no idea what is significant about those three bytes, nor why it should suddenly have come up. The only thing that seems related is that it happens in association with the imapd process being killed, and the switch to using setjmp()/longjmp() in the signal handlers.

Here is the underlying problem:

glibc "improved" things so that there are numerous new mutexes to cover possible multi-threading, even for non-threaded applications such as imapd. There were other complications: e.g., putc() is now far slower.

The impact extends to syslog(). imapd, when it receives a signal to terminate, wants to issue a log message announcing this fact. Thanks to the mutex, it no longer can do so in the signal handler...even when it has no intention of returning back to the interrupted code!

Matters are futher complicated in traditional UNIX mailbox format; imapd would like to update the mailbox before it exits (to avoid the problem of lost flags) but once again runs afoul of the mutex.

To work around this, I tried to use a setjmp()/longjmp() in the signal handler that would take imapd back to the main command loop and then to code to save and exit. Supposedly, longjmp() is supposed to unwind whatever context occurred since the setjmp().

The patch that I suggested in January 2008 removed the step of saving the mailbox updates after the longjmp(). What this all means is that it should still do the longjmp(), but not write anything further to the file and just syslog() and exit. The reports indicate that this doesn't seem to have fixed the problem.

I am working with a FreeBSD site that has experienced the problem to try a more aggressive version of the patch that removes the longjmp() entirely.

If it isn't the longjmp(), then I don't know what the hell is going on.

If it is the longjmp(), then I'll develop some other way around the issue in Panda IMAP.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
_______________________________________________
Imap-uw mailing list
[email protected]
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw

Reply via email to