On Tuesday 25 March 2008 09:09:19 Ralf Gross wrote: > Kern Sibbald schrieb: > > On Monday 24 March 2008 23:28:07 Ralf Gross wrote: > > > > The workaround is either to turn off heartbeat in your FD for your > > > > Verify jobs (not possible on a job by job basis) or set it longer > > > > than the time it takes to run the verify. > > > > By the way, when using your localhost loopback device, you should never > > need to enable heartbeat. It is needed only when you have a router > > between two machines and that router does not correctly implement the > > Internet keepalive standard. > > The heartbeat option is part of my template file for all fd configs. > But it's obviously not needed here. > > [...] > > > > > The longterm solution is that Bacula should not use the heartbeat > > > > code during a Verify. > > > > > > I'm really glad that you found the reason, as you can see in bug > > > report #1061 it was slowly driving me crazy ;) > > > > Sorry it was driving you crazy, but often some of the most difficult > > problems are uncovered and resolved when people are upset with it and > > determined to find a resolution, which was your case here :-) > > The symptom with the bsock error was misleading. Looking at the bacula > logs, a message about a missing or different file wasn't all the time > present when a bsock error occured. > > eg: > > 15-Mär 12:32 VUMEM004-sd JobId 1580: Forward spacing Volume > "vu0em003-inc-0120" to file:block 0:226. 15-Mär 12:40 VUMEM004-dir JobId > 1580: Fatal error: bsock.c:415 Packet size too big from "Client: > VUMEM004-fd:10.60.1.231:9102. Terminating connection. > > 22-Mär 10:31 VUMEM004-sd JobId 1663: Forward spacing Volume "itd-diff-0133" > to file:block 0:227. 22-Mär 10:35 VUMEM004-dir JobId 1663: Fatal error: > bsock.c:415 Packet size too big from "Client: VUMEM004-fd:10.60.1.231:9102. > Terminating connection. > > in contrast to this message: > > 22-Mär 21:11 VUMEM004-sd JobId 1674: Forward spacing Volume "itd-diff-0133" > to file:block 0:227. 22-Mär 21:14 VUMEM004-dir JobId 1674: New file: > /var/www_etas/howto/fit????/pics/.xvpics/email3.jpg 22-Mär 21:14 > VUMEM004-dir JobId 1674: Fatal error: bsock.c:415 Packet size too big from > "Client: VUMEM004-fd:10.60.1.231:9102. Terminating connection.
Yes, it can be very confusing if you are not familiar with the code as I am. Once the packets start getting corrupted, you can get all kinds of problems, simple job failures as in your Verify, packet size too big, and unfortunately in a few cases crashes. Bacula tries to protect itself from bad data, but it doesn't always work out. In your case, I still don't understand why the mutex error occurred, but 95% it is caused at the root by the bad data arriving from the FD. I've got a fix that I am testing for the problem, but it is definitely too large to go into 2.2.9, so I will probably release a patch for it next week. Regards, Kern ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel