I am aware of the problem, but only recently that it happened on other
systems than Solaris. I am now aware that it happens on FreeBSD and
Linux.
I am astounded by the fact that the first three bytes of the corruption
always seem to be <CTRL/W><CTRL/C><CTRL/A>, 0x17 0x03 0x01, even on
different platforms.
I have no idea what is significant about those three bytes, nor why it
should suddenly have come up. The only thing that seems related is
that it happens in association with the imapd process being killed, and
the switch to using setjmp()/longjmp() in the signal handlers.
Here is the underlying problem:
glibc "improved" things so that there are numerous new mutexes to cover
possible multi-threading, even for non-threaded applications such as
imapd. There were other complications: e.g., putc() is now far slower.
The impact extends to syslog(). imapd, when it receives a signal to
terminate, wants to issue a log message announcing this fact. Thanks to
the mutex, it no longer can do so in the signal handler...even when it has
no intention of returning back to the interrupted code!
Matters are futher complicated in traditional UNIX mailbox format; imapd
would like to update the mailbox before it exits (to avoid the problem of
lost flags) but once again runs afoul of the mutex.
To work around this, I tried to use a setjmp()/longjmp() in the signal
handler that would take imapd back to the main command loop and then to
code to save and exit. Supposedly, longjmp() is supposed to unwind
whatever context occurred since the setjmp().
The patch that I suggested in January 2008 removed the step of saving the
mailbox updates after the longjmp(). What this all means is that it
should still do the longjmp(), but not write anything further to the file
and just syslog() and exit. The reports indicate that this doesn't seem
to have fixed the problem.
I am working with a FreeBSD site that has experienced the problem to try a
more aggressive version of the patch that removes the longjmp() entirely.
If it isn't the longjmp(), then I don't know what the hell is going on.
If it is the longjmp(), then I'll develop some other way around the issue
in Panda IMAP.
-- Mark --
http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
_______________________________________________
Imap-uw mailing list
[email protected]
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw