Please open a report in bugzilla and mark it was a "blocker". Thanks
for finding the issue.
:wes
On 17 Jun 2009, at 09:44, Michael Bacon wrote:
It turns out that this was an issue with mupdate being a multi-
threaded daemon, and in a critical place in the non-blocking prot
code (in prot_flush_internal()), the behavior relies on the value
of errno. If it's EAGAIN, the write will try again, otherwise it
sets s->error and quits. Naturally, being a global variable
normally, errno doesn't work terribly well in multi-threaded code
unless the necessary thread safety switch is passed to the
compiler. Hence, when thread #5 was getting a -1 from the write(2)
system call, it was reading errno as 0, rather than EAGAIN as it
should have been.
The solution, should anyone else run into this, is as simple as
recompiling with the thread safety switch. (In the case of Sun's
SPro, it's -mt. I think it's -mthread for gcc, but I'm not sure.)
Maddening that the fix was that simple, as I spent two solid weeks
hunting for the dratted bug.
I have two requests to the CVS maintainers out there. First, the
below patch to current CVS isn't terribly comprehensive, and
doesn't narrow it down from about a dozen places s->error could be
set, but at least would have given SOME kind of indication on the
server that something had gone wrong, and might have saved me about
a week of hunting.
Secondly, I am very weak in the ways of autoconf, but it strikes me
that since Cyrus now builds mupdate as multithreaded by default
(good decision, IMO), autoconf should make some attempt to figure
out what thread safety switch is appropriate and add it to CFLAGS.