Hi. Thank you, I'll try.
But why my job terminates exactly after 6 days?

> Hello,
>
> Your problem is a comm line drop not a watch dog problem.
>
> Put HeartBeatInterval = 300 in your Dir, SD, and FDs.
>
> Best regards,
> Kern
>
> On 01/15/2014 09:28 AM, Andrey Chebotarev wrote:
>> I asked because in the latest version(5.2.13) modifying sources doesn't
>> work anymore.
>> I've changed this part:
>>       /*
>>        * ****FIXME**** reduce this to a few hours once
>>        *   heartbeats are implemented
>>        */
>>       bsock->timeout = 60 * 60 * 30 * 24;
>>
>> but job still terminates after 6 days :(
>>
>> In 5.2.11 I didn't have such problem.
>> What has been changed in 5.2.13 ? In which part of code I can fix it?
>>
>>> Hi.
>>> I'm using bacula to backup huge stuff, about 100TB. Usually it takes
>>> about 15-16 days.
>>> I've faced with a problem. As I understood, in bacula there is mechanism
>>> which cares about jobs(watchdog timer). And with this mechanism I have
>>> trouble. My job terminates after 6 days with error message:
>>>
>>> 2013-12-29 16:42:56baculasrv-dir JobId 8013: Error: Watchdog sending
>>> kill after 518427 secs to thread stalled reading File daemon.
>>> 2013-12-29 16:42:56baculasrv-dir JobId 8013: Fatal error: Network error
>>> with FD during Backup: ERR=Interrupted system call
>>> 2013-12-29 16:42:57baculasrv-sd JobId 8013: Elapsed time=143:47:09,
>>> Transfer rate=58.09 M Bytes/second
>>> 2013-12-29 16:42:57baculasrv-dir JobId 8013: Error: Director's comm line
>>> to SD dropped.
>>> 2013-12-29 16:42:57baculasrv-dir JobId 8013: Fatal error: No Job status
>>> returned from FD.
>>> 2013-12-29 16:42:57baculasrv-dir JobId 8013: Error: Bacula baculasrv-dir
>>> 5.2.13 (19Jan13):
>>>
>>> But my job is still active. Where is the problem? FD isn't sending
>>> "keep-alive" packets or 6 days is hardcoded interval of maximum running
>>> time?
>>>
>>> In sources I see this(src/lib/bnet.c):
>>>
>>>       /*
>>>        * ****FIXME**** reduce this to a few hours once
>>>        *   heartbeats are implemented
>>>        */
>>>       bsock->timeout = 60 * 60 * 6 * 24;   /* 6 days timeout */
>>>
>>> Is it mean that  heartbeat isn't implemented yet?
>>>
>>> Now I'm changing that interval to 30 days.
>>> Is there any more beautiful way?
>>>
>>> ------------------------------------------------------------------------------
>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>> organizations don't have a clear picture of how application performance
>>> affects their revenue. With AppDynamics, you get 100% visibility into your
>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics 
>>> Pro!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Bacula-devel mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>> ------------------------------------------------------------------------------
>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>> Critical Workloads, Development Environments & Everything In Between.
>> Get a Quote or Start a Free Trial Today.
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Bacula-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>>


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to