Please open a report in bugzilla and mark it was a "blocker". Thanks for finding the issue.

:wes

On 17 Jun 2009, at 09:44, Michael Bacon wrote:
It turns out that this was an issue with mupdate being a multi- threaded daemon, and in a critical place in the non-blocking prot code (in prot_flush_internal()), the behavior relies on the value of errno. If it's EAGAIN, the write will try again, otherwise it sets s->error and quits. Naturally, being a global variable normally, errno doesn't work terribly well in multi-threaded code unless the necessary thread safety switch is passed to the compiler. Hence, when thread #5 was getting a -1 from the write(2) system call, it was reading errno as 0, rather than EAGAIN as it should have been.

The solution, should anyone else run into this, is as simple as recompiling with the thread safety switch. (In the case of Sun's SPro, it's -mt. I think it's -mthread for gcc, but I'm not sure.) Maddening that the fix was that simple, as I spent two solid weeks hunting for the dratted bug.

I have two requests to the CVS maintainers out there. First, the below patch to current CVS isn't terribly comprehensive, and doesn't narrow it down from about a dozen places s->error could be set, but at least would have given SOME kind of indication on the server that something had gone wrong, and might have saved me about a week of hunting.

Secondly, I am very weak in the ways of autoconf, but it strikes me that since Cyrus now builds mupdate as multithreaded by default (good decision, IMO), autoconf should make some attempt to figure out what thread safety switch is appropriate and add it to CFLAGS.

Reply via email to