Hi Ana, You are absolutely correct! I didn't even notice that. Looking at the log, that appears to be the case.
Thank you so much! -craig On Fri, Jul 24, 2015 at 3:23 AM, Ana Emília M. Arruda < emiliaarr...@gmail.com> wrote: > Hello Craig, > > It seems you have this job scheduled every day at about 18 pm. Am I > correct? If this is the case, I think that your job scheduled to run at > 2015-07-20 > (maybe at 18 pm), started at 18:01:39 but did not finished until 2015-07-21 > 04:49:42. So the JobId 118594 is the same job that was scheduled for the > next day, 2015-07-21 at 18 pm, and this job had been waiting for the > jobid 118277 to finish (since you do not have allowed duplicate jobs). > Since the job 11827 was canceled, no more retries would be carried out. > > Best regards, > Ana > > On Fri, Jul 24, 2015 at 1:41 AM, Craig Shiroma <shiroma.crai...@gmail.com> > wrote: > >> Hi Ana, >> >> >Sorry for the delay. >> >> No apologizes necessary! Thank you for the help. I appreciate it. >> >> >Don't you have log lines for the retry that should occur at 21-Jul-2015 >> 05:49? >> >> I see no entry in the log file indicating job 118277 was restarted. >> >> >Have you manually canceled the JobId 118277? >> >> No, no one manually cancelled job 118277. I'm guessing Bacula did >> somehow. >> >> Would you have any idea what would cause job 118594 to be started? Up >> to now, I thought it was the restarted job under a new jobId. >> >> The above situation occurs occasionally (not frequently) for other host >> backups. I have not been able to figure why. >> >> This is what shows up if I do a 'list files job=<host>: >> >> <snipped> >> | 117,865 | <host> | 2015-07-19 18:01:35 | B | I | 70 | >> 13,864,818 | T | >> | 118,277 | <host> | 2015-07-20 18:01:39 | B | F | 1,631,292 | >> 311,164,701,221 | f | >> | 118,594 | <host> | 2015-07-21 04:49:57 | B | F | 0 | >> 0 | A | >> <snipped> >> >> I did run a test to simulate what happened on our test Bacula >> environment. I run a backup, then while it is running I shutdown the >> bacula-sd daemon on the storage host. The exact same thing happened. I >> get the same (or similar) network error, the same 'job will be restarted in >> 3600 seconds' message, another job starts right away with a different jobId >> and the original job is cancelled. No job restart with the same jobId >> occurs 1 hour later. >> >> Many thanks, >> -craig >> >> On Thu, Jul 23, 2015 at 5:21 PM, Ana Emília M. Arruda < >> emiliaarr...@gmail.com> wrote: >> >>> Hello Craig, >>> >>> Sorry for the delay. >>> >>> The second email does not seems to be a retry. All the retries for a job >>> have the same JobId. That is the same JobId will be retryed the "reschedule >>> times" you had configured for the Job. >>> >>> On Wed, Jul 22, 2015 at 4:49 AM, Craig Shiroma < >>> shiroma.crai...@gmail.com> wrote: >>> >>>> Hi Ana, >>>> >>>> Thank you for the reply. >>>> >>>> Sorry, yes that is what I meant. >>>> >>>> I'm wondering if Bacula canceled the job due to the Fatal error and the >>>> below line included in the first "bacula error" email actually shouldn't be >>>> there (i.e. erroneous) because no retry was made at 05:49 (3600 seconds >>>> later). However, the second "bacula error" email seems to indicate a retry >>>> was actually attempted although at the wrong time (2015-07-21 04:49:57). >>>> >>>> >>>> 2015-07-21 04:49:57bacula-dir JobId 118277: Rescheduled Job >>>> <host>.2015-07-20_18.00.03_31 at 21-Jul-2015 04:49 to re-run in 3600 >>>> seconds (21-Jul-2015 05:49). >>>> >>>> Thanks again! >>>> -craig >>>> >>>> >>>> >>>> On Tue, Jul 21, 2015 at 4:55 PM, Ana Emília M. Arruda < >>>> emiliaarr...@gmail.com> wrote: >>>> >>>>> Hello Craig, >>>>> >>>>> Do you mean reschedule interval = 1 hour and reschedule times = 3? >>>>> >>>>> Best regards, >>>>> Ana >>>>> Em ter, 21 de jul de 2015 às 19:29, Craig Shiroma < >>>>> shiroma.crai...@gmail.com> escreveu: >>>>> >>>>>> Hello, >>>>>> >>>>>> I had a backup that failed this morning due to what looks like a >>>>>> network problem. >>>>>> >>>>>> First error email: >>>>>> 2015-07-21 04:49:42<host> JobId 118277: Error: lib/bsock.c:693 Write >>>>>> error sending 65813 bytes to Storage daemon:<host>:9103: >>>>>> ERR=Input/output error >>>>>> 2015-07-21 04:49:42<host> JobId 118277: Fatal error: >>>>>> filed/backup.c:1282 Network send error to SD. ERR=Input/output error >>>>>> ... >>>>>> FD termination status: Error >>>>>> SD termination status: Error >>>>>> Termination: *** Backup Error *** >>>>>> >>>>>> 2015-07-21 04:49:57bacula-dir JobId 118277: Rescheduled Job >>>>>> <host>.2015-07-20_18.00.03_31 at 21-Jul-2015 04:49 to re-run in 3600 >>>>>> seconds (21-Jul-2015 05:49). >>>>>> >>>>> >>> This means that your Job (scheduled to run at 18:00 2015-07-20) was >>> rescheduled to run at 21-Jul-2015 05:49 (1 hour passed 21-Jul-2015 >>> 04:49, the time it started). >>> >>> >>>> >>>>>> >>>>>> I have Retry set to 3 and retry interval set to 1 hour. However, the >>>>>> retry was canceled as seen below. I'm wondering how I should interpret >>>>>> the >>>>>> below messages. Did Bacula notice the job was still running when it >>>>>> tried >>>>>> to reschedule the job and then cancel it right then (practically at the >>>>>> same time the above error occurred)? Or, did it actually try to re-run >>>>>> the >>>>>> job right after the error instead of one hour later (5:49) and cancel the >>>>>> job? If so, why did it try to re-run the job right after the error >>>>>> instead >>>>>> of one hour later? >>>>>> >>>>>> Second error email: >>>>>> 2015-07-21 04:49:57bacula-dir JobId 118594: Fatal error: JobId 118277 >>>>>> already running. Duplicate job not allowed. >>>>>> >>>>> >>> It seems that a duplicate job (JobId 118594) started at a time that >>> the first one was still running (JobId 118277). >>> >>> >>>> >>>>>> 2015-07-21 04:49:57bacula-dir JobId 118594: Bacula bacula-dir 7.0.5 >>>>>> (28Jul14): >>>>>> >>>>>> FD termination status: >>>>>> SD termination status: >>>>>> Termination: Backup Canceled >>>>>> >>>>>> >>> The duplicate job was canceled. Don't you have log lines for the retry >>> that should occur at 21-Jul-2015 05:49? Have you manually canceled the >>> JobId 118277? If so, no retry would be carried out. >>> >>> >>>> >>>>>> >>>>>> Best regards, >>>>>> Craig >>>>>> >>>>> >>> Best regards, >>> Ana >>> >>> >>> >>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Don't Limit Your Business. Reach for the Cloud. >>>>>> GigeNET's Cloud Solutions provide you with the tools and support that >>>>>> you need to offload your IT needs and focus on growing your business. >>>>>> Configured For All Businesses. Start Your Cloud Today. >>>>>> https://www.gigenetcloud.com/ >>>>>> _______________________________________________ >>>>>> Bacula-users mailing list >>>>>> Bacula-users@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users >>>>>> >>>>> >>>> >>> >> >
------------------------------------------------------------------------------
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users