Hi. Thank you, I'll try. But why my job terminates exactly after 6 days? > Hello, > > Your problem is a comm line drop not a watch dog problem. > > Put HeartBeatInterval = 300 in your Dir, SD, and FDs. > > Best regards, > Kern > > On 01/15/2014 09:28 AM, Andrey Chebotarev wrote: >> I asked because in the latest version(5.2.13) modifying sources doesn't >> work anymore. >> I've changed this part: >> /* >> * ****FIXME**** reduce this to a few hours once >> * heartbeats are implemented >> */ >> bsock->timeout = 60 * 60 * 30 * 24; >> >> but job still terminates after 6 days :( >> >> In 5.2.11 I didn't have such problem. >> What has been changed in 5.2.13 ? In which part of code I can fix it? >> >>> Hi. >>> I'm using bacula to backup huge stuff, about 100TB. Usually it takes >>> about 15-16 days. >>> I've faced with a problem. As I understood, in bacula there is mechanism >>> which cares about jobs(watchdog timer). And with this mechanism I have >>> trouble. My job terminates after 6 days with error message: >>> >>> 2013-12-29 16:42:56baculasrv-dir JobId 8013: Error: Watchdog sending >>> kill after 518427 secs to thread stalled reading File daemon. >>> 2013-12-29 16:42:56baculasrv-dir JobId 8013: Fatal error: Network error >>> with FD during Backup: ERR=Interrupted system call >>> 2013-12-29 16:42:57baculasrv-sd JobId 8013: Elapsed time=143:47:09, >>> Transfer rate=58.09 M Bytes/second >>> 2013-12-29 16:42:57baculasrv-dir JobId 8013: Error: Director's comm line >>> to SD dropped. >>> 2013-12-29 16:42:57baculasrv-dir JobId 8013: Fatal error: No Job status >>> returned from FD. >>> 2013-12-29 16:42:57baculasrv-dir JobId 8013: Error: Bacula baculasrv-dir >>> 5.2.13 (19Jan13): >>> >>> But my job is still active. Where is the problem? FD isn't sending >>> "keep-alive" packets or 6 days is hardcoded interval of maximum running >>> time? >>> >>> In sources I see this(src/lib/bnet.c): >>> >>> /* >>> * ****FIXME**** reduce this to a few hours once >>> * heartbeats are implemented >>> */ >>> bsock->timeout = 60 * 60 * 6 * 24; /* 6 days timeout */ >>> >>> Is it mean that heartbeat isn't implemented yet? >>> >>> Now I'm changing that interval to 30 days. >>> Is there any more beautiful way? >>> >>> ------------------------------------------------------------------------------ >>> Rapidly troubleshoot problems before they affect your business. Most IT >>> organizations don't have a clear picture of how application performance >>> affects their revenue. With AppDynamics, you get 100% visibility into your >>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics >>> Pro! >>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Bacula-devel mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/bacula-devel >> ------------------------------------------------------------------------------ >> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >> Learn Why More Businesses Are Choosing CenturyLink Cloud For >> Critical Workloads, Development Environments & Everything In Between. >> Get a Quote or Start a Free Trial Today. >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >> _______________________________________________ >> Bacula-devel mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/bacula-devel >>
------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
