On Monday 24 March 2008 23:28:07 Ralf Gross wrote: > Kern Sibbald schrieb: > > I just took my dog out for his late night run. The nice thing about that > > is that it gave me a chance to think about your problem given your new > > information, and I now am about 95% sure I now know what is going wrong. > > > > The FFFFFFFA you are seeing is a negative integer, as I previously > > mentioned. In fact, it is a -6, which is exactly the code that Bacula > > uses to signal a heartbeat. > > > > So, I imagine that my hypothesis #3 is kicking in (a Bacula bug). You > > most likely have heartbeat turned on between the DIR and the FD, and you > > probably have it set it to a low interval. Unfortunately, the heartbeat > > during a Verify is very likely to create exactly this problem. > > fd: Heartbeat Interval = 300 > dir: Heartbeat Interval = 5min > sd: Heartbeat Interval = 5min > > The funny thing is, that the problem does not always happen in the > same time frame, or even near the heartbeat interval of 5min.
Depending on what is going on the first heartbeat can occur anytime during the first 5 minute interval, so that is not too surprising. > > > The workaround is either to turn off heartbeat in your FD for your Verify > > jobs (not possible on a job by job basis) or set it longer than the time > > it takes to run the verify. By the way, when using your localhost loopback device, you should never need to enable heartbeat. It is needed only when you have a router between two machines and that router does not correctly implement the Internet keepalive standard. > > There are some very long running verify jobs (TB's of data), setting > the heartbeat to an interval of xx hours wouldn't make sense. But I'll > try to disable the fd's heartbeat completely. The fd I use for verify > jobs is running on the same system the dir is running. So it shouldn't > be a problem. Yes, I agree, there should be no problems during backup. > > > The longterm solution is that Bacula should not use the heartbeat code > > during a Verify. > > I'm really glad that you found the reason, as you can see in bug > report #1061 it was slowly driving me crazy ;) Sorry it was driving you crazy, but often some of the most difficult problems are uncovered and resolved when people are upset with it and determined to find a resolution, which was your case here :-) Regards, Kern > > Ralf > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Bacula-devel mailing list > Bacula-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-devel ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel