Hello Craig,

It seems you have this job scheduled every day at about 18 pm. Am I
correct? If this is the case, I think that your job scheduled to run
at 2015-07-20
(maybe at 18 pm), started at 18:01:39 but did not finished until 2015-07-21
04:49:42. So the JobId 118594 is the same job that was scheduled for the
next day, 2015-07-21 at 18 pm, and this job had been waiting for the jobid
118277 to finish (since you do not have allowed duplicate jobs). Since the
job 11827 was canceled, no more retries would be carried out.

Best regards,
Ana

On Fri, Jul 24, 2015 at 1:41 AM, Craig Shiroma <shiroma.crai...@gmail.com>
wrote:

> Hi Ana,
>
> >Sorry for the delay.
>
> No apologizes necessary!  Thank you for the help.  I appreciate it.
>
> >Don't you have log lines for the retry that should occur at 21-Jul-2015
> 05:49?
>
> I see no entry in the log file indicating job 118277 was restarted.
>
> >Have you manually canceled the JobId 118277?
>
> No, no one manually cancelled job 118277.  I'm guessing Bacula did
> somehow.
>
> Would you have any idea what would cause job 118594 to be started?  Up to
> now, I thought it was the restarted job under a new jobId.
>
> The above situation occurs occasionally (not frequently) for other host
> backups.  I have not been able to figure why.
>
> This is what shows up if I do a 'list files job=<host>:
>
> <snipped>
> | 117,865 | <host> | 2015-07-19 18:01:35 | B    | I     |        70 |
>  13,864,818 | T         |
> | 118,277 | <host> | 2015-07-20 18:01:39 | B    | F     | 1,631,292 |
> 311,164,701,221 | f         |
> | 118,594 | <host> | 2015-07-21 04:49:57 | B    | F     |         0 |
>           0 | A         |
> <snipped>
>
> I did run a test to simulate what happened on our test Bacula
> environment.  I run a backup, then while it is running I shutdown the
> bacula-sd daemon on the storage host.  The exact same thing happened.  I
> get the same (or similar) network error, the same 'job will be restarted in
> 3600 seconds' message, another job starts right away with a different jobId
> and the original job is cancelled.  No job restart with the same jobId
> occurs 1 hour later.
>
> Many thanks,
> -craig
>
> On Thu, Jul 23, 2015 at 5:21 PM, Ana Emília M. Arruda <
> emiliaarr...@gmail.com> wrote:
>
>> Hello Craig,
>>
>> Sorry for the delay.
>>
>> The second email does not seems to be a retry. All the retries for a job
>> have the same JobId. That is the same JobId will be retryed the "reschedule
>> times" you had configured for the Job.
>>
>> On Wed, Jul 22, 2015 at 4:49 AM, Craig Shiroma <shiroma.crai...@gmail.com
>> > wrote:
>>
>>> Hi Ana,
>>>
>>> Thank you for the reply.
>>>
>>> Sorry, yes that is what I meant.
>>>
>>> I'm wondering if Bacula canceled the job due to the Fatal error and the
>>> below line included in the first "bacula error" email actually shouldn't be
>>> there (i.e. erroneous) because no retry was made at 05:49 (3600 seconds
>>> later).  However, the second "bacula error" email seems to indicate a retry
>>> was actually attempted although at the wrong time (2015-07-21 04:49:57).
>>>
>>>
>>> 2015-07-21 04:49:57bacula-dir JobId 118277: Rescheduled Job
>>> <host>.2015-07-20_18.00.03_31 at 21-Jul-2015 04:49 to re-run in 3600
>>> seconds (21-Jul-2015 05:49).
>>>
>>> Thanks again!
>>> -craig
>>>
>>>
>>>
>>> On Tue, Jul 21, 2015 at 4:55 PM, Ana Emília M. Arruda <
>>> emiliaarr...@gmail.com> wrote:
>>>
>>>> Hello Craig,
>>>>
>>>> Do you mean reschedule interval = 1 hour and reschedule times = 3?
>>>>
>>>> Best regards,
>>>> Ana
>>>> Em ter, 21 de jul de 2015 às 19:29, Craig Shiroma <
>>>> shiroma.crai...@gmail.com> escreveu:
>>>>
>>>>> Hello,
>>>>>
>>>>> I had a backup that failed this morning due to what looks like a
>>>>> network problem.
>>>>>
>>>>> First error email:
>>>>> 2015-07-21 04:49:42<host> JobId 118277: Error: lib/bsock.c:693 Write
>>>>> error sending 65813 bytes to Storage daemon:<host>:9103:
>>>>> ERR=Input/output error
>>>>> 2015-07-21 04:49:42<host> JobId 118277: Fatal error:
>>>>> filed/backup.c:1282 Network send error to SD. ERR=Input/output error
>>>>> ...
>>>>> FD termination status:  Error
>>>>> SD termination status:  Error
>>>>> Termination:            *** Backup Error ***
>>>>>
>>>>> 2015-07-21 04:49:57bacula-dir JobId 118277: Rescheduled Job
>>>>> <host>.2015-07-20_18.00.03_31 at 21-Jul-2015 04:49 to re-run in 3600
>>>>> seconds (21-Jul-2015 05:49).
>>>>>
>>>>
>> ​This means that your Job (scheduled to run at 18:00 2015-07-20) was
>> rescheduled to run at 21-Jul-2015 05:49 (1 hour passed 21-Jul-2015
>> 04:49, the time it started).
>>
>>
>>>
>>>>>
>>>>> I have Retry set to 3 and retry interval set to 1 hour.  However, the
>>>>> retry was canceled as seen below.  I'm wondering how I should interpret 
>>>>> the
>>>>> below messages.  Did Bacula notice the job was still running when it tried
>>>>> to reschedule the job and then cancel it right then (practically at the
>>>>> same time the above error occurred)?  Or, did it actually try to re-run 
>>>>> the
>>>>> job right after the error instead of one hour later (5:49) and cancel the
>>>>> job?  If so, why did it try to re-run the job right after the error 
>>>>> instead
>>>>> of one hour later?
>>>>>
>>>>> Second error email:
>>>>> 2015-07-21 04:49:57bacula-dir JobId 118594: Fatal error: JobId 118277
>>>>> already running. Duplicate job not allowed.
>>>>>
>>>>
>> ​It seems that a duplicate job (​JobId 118594) started at a time that
>> the first one was still running (JobId 118277).
>>
>>
>>>
>>>>> 2015-07-21 04:49:57bacula-dir JobId 118594: Bacula bacula-dir 7.0.5
>>>>> (28Jul14):
>>>>>
>>>>> FD termination status:
>>>>> SD termination status:
>>>>> Termination:            Backup Canceled
>>>>>
>>>>>
>> ​The duplicate job was canceled. Don't you have log lines for the retry
>> that should occur at 21-Jul-2015 05:49? Have you manually canceled the
>> JobId 118277? If so, no retry would be carried out.
>>
>>
>>>
>>>>>
>>>>> Best regards,
>>>>> Craig
>>>>>
>>>>
>> ​Best regards,
>> Ana​
>>
>>
>>
>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Don't Limit Your Business. Reach for the Cloud.
>>>>> GigeNET's Cloud Solutions provide you with the tools and support that
>>>>> you need to offload your IT needs and focus on growing your business.
>>>>> Configured For All Businesses. Start Your Cloud Today.
>>>>> https://www.gigenetcloud.com/
>>>>> _______________________________________________
>>>>> Bacula-users mailing list
>>>>> Bacula-users@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to