On Thursday 27 September 2007 21:10, Dan Langille wrote: > On 27 Sep 2007 at 17:07, Kern Sibbald wrote: > > Hello, > > > > Well, after a zillion hours of non-trivial debugging, I finally figured > > out why we were getting comm line connection problems on the FreeBSD > > machine during the 2drive-incremental-2disk. It turns out that on your > > system, the pthread_cond_timedwait() gets spurious returns (i.e. with a 0 > > status), which essentially simulated a connection being made but no > > authorization, so the job was cancelled. > > > > It is interesting because I have never seen this problem on any other > > system though I seem to recall some such reports. Any the > > pthread_cond_timedwait, documentation permits spurious returns, so I've > > modified the code to specifically test the conditions on which it is > > waiting. > > The system you're testing on is FreeBSD 7.x, which is not stable. > We've never seen the problem under 6.2-STABLE AFAIK.
Hopefully they will fix it in 7.x ... The pthreads library is doing something that, IMO, it should not, but according to the Linux documentation, it *is* permitted to have spurious good status codes returned from a pthread_cond_wait (or timedwait). Anyway, I was able to run the job that failed previously always within 20 executions 100 times without any failures, so at least I seem to have found a programmatic way to avoid the problem, and I've check the other pthread_cond_waits in the SD to ensure that none should have the same problem. Unfortunately, there is one in the libraries, so I have more work to do, but at least this particular failure is behind us. Regards, Kern ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
