So, I'm now doing disk-to-disk-to-disk backup, with the third tier being a rotating ZFS pool of removable SATA disks. My full backups ran this morning, and all of the backup jobs completed successfully without errors, but the archive copy job failed without reporting any errors.
Here's the job definition: Job { Name = "Archive Copy" Type = Copy Enabled = Yes Pool = Full-Disk Level = Full Client = ALL Fileset = DUMMY Storage = asgard-archive Selection Type = SQL Query Selection Pattern = "SELECT DISTINCT J.JobId, J.StartTime FROM Job J JOIN Pool P ON P.PoolId = J.PoolId WHERE P.Name = 'Full-Disk' AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.StartTime > now() - interval 24 hour" Messages = Daemon Priority = 20 Rerun Failed Levels = yes Allow Duplicate Jobs = no Cancel Queued Duplicates = yes Schedule = "Full Backup Archive" } And here's what that selection pattern query returns if I run it manually, right now: MySQL 127.0.0.1> SELECT DISTINCT J.JobId, J.StartTime FROM Job J JOIN Pool P ON P.PoolId = J.PoolId WHERE P.Name = 'Full-Disk' AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.StartTime > now() - interval 24 hour; +-------+---------------------+ | JobId | StartTime | +-------+---------------------+ | 23476 | 2019-12-02 04:30:03 | | 23477 | 2019-12-02 04:30:03 | | 23478 | 2019-12-02 04:30:03 | | 23479 | 2019-12-02 04:30:03 | | 23480 | 2019-12-02 04:30:03 | | 23481 | 2019-12-02 04:30:04 | | 23482 | 2019-12-02 11:40:57 | +-------+---------------------+ 7 rows in set (0.001 sec) And here's the job output: 02-Dec 05:00 minbar-dir JobId 23483: No JobIds found to copy. 02-Dec 11:42 minbar-dir JobId 23483: Bacula minbar-dir 9.4.4 (28May19): Build OS: x86_64-pc-linux-gnu gentoo Prev Backup JobId: 0 Prev Backup Job: New Backup JobId: 0 Current JobId: 23483 Current Job: Archive_Copy.2019-12-02_05.00.00_46 Backup Level: Full Client: ALL FileSet: "DUMMY" 2019-11-04 12:00:40 Read Pool: "Full-Disk" (From Job resource) Read Storage: "asgard-archive" (From Job resource) Write Pool: "Full-Archive" (From Pool's NextPool resource) Write Storage: "asgard-archive" (From Pool's NextPool resource) Catalog: "Catalog" (From Client resource) Start time: 02-Dec-2019 11:42:38 End time: 02-Dec-2019 11:42:38 Elapsed time: 1 sec Priority: 20 SD Files Written: 0 SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Volume name(s): Volume Session Id: 0 Volume Session Time: 0 Last Volume Bytes: 0 (0 B) SD Errors: 0 SD termination status: Termination: Copying -- no files to copy When I tried to rerun the Archive Copy job manually from BAT, the Director crashed. When I restarted the Director and manually ran it AGAIN, this time it appears to be starting and running exactly as it should, the console reporting: run job="Archive Copy" fileset="DUMMY" level="Full" client="ALL" pool="Full-Disk" storage="asgard-file" priority="10" when="2019-12-02 13:05:49" yes Job queued. JobId=23484 02-Dec 13:05 minbar-dir JobId 23484: The following 7 JobIds were chosen to be copied: 23476,23477,23478,23479,23480,23481,23482 02-Dec 13:05 minbar-dir JobId 23485: Copying using JobId=23476 Job=Asgard_Backup.2019-12-02_04.30.00_39 02-Dec 13:05 minbar-dir JobId 23484: Job queued. JobId=23485 02-Dec 13:05 minbar-dir JobId 23484: Copying JobId 23485 started. 02-Dec 13:05 minbar-dir JobId 23487: Copying using JobId=23477 Job=Netstore_Backup.2019-12-02_04.30.00_40 02-Dec 13:05 minbar-dir JobId 23484: Job queued. JobId=23487 02-Dec 13:05 minbar-dir JobId 23484: Copying JobId 23487 started. 02-Dec 13:05 minbar-dir JobId 23485: Start Copying JobId 23485, Job=Archive_Copy.2019-12-02_13.05.53_05 02-Dec 13:05 minbar-dir JobId 23489: Copying using JobId=23478 Job=Babylon5_Backup.2019-12-02_04.30.00_41 02-Dec 13:05 minbar-dir JobId 23485: Using Device "FileStorage" to read. 02-Dec 13:05 minbar-dir JobId 23484: Job queued. JobId=23489 02-Dec 13:05 minbar-dir JobId 23484: Copying JobId 23489 started. 02-Dec 13:05 minbar-dir JobId 23491: Copying using JobId=23479 Job=Jumpgate_Backup.2019-12-02_04.30.00_42 02-Dec 13:05 minbar-dir JobId 23484: Job queued. JobId=23491 02-Dec 13:05 minbar-dir JobId 23484: Copying JobId 23491 started. 02-Dec 13:05 minbar-dir JobId 23493: Copying using JobId=23480 Job=Minbar_Backup.2019-12-02_04.30.00_43 02-Dec 13:05 minbar-dir JobId 23486: Max configured use duration=82,800 sec. exceeded. Marking Volume "ARCHIVE-20191104-12:00" as Used. 02-Dec 13:05 minbar-dir JobId 23486: Created new Volume="ARCHIVE-20191202-13:05", Pool="Full-Archive", MediaType="File" in catalog. 02-Dec 13:05 minbar-dir JobId 23486: Using Device "ArchiveCopy" to write. 02-Dec 13:05 minbar-dir JobId 23484: Job queued. JobId=23493 02-Dec 13:05 minbar-dir JobId 23484: Copying JobId 23493 started. 02-Dec 13:05 minbar-dir JobId 23495: Copying using JobId=23481 Job=Narn_Backup.2019-12-02_04.30.00_44 02-Dec 13:05 asgard-sd JobId 23486: Labeled new Volume "ARCHIVE-20191202-13:05" on File device "ArchiveCopy" (/arcpool). 02-Dec 13:05 minbar-dir JobId 23484: Job queued. JobId=23495 02-Dec 13:05 minbar-dir JobId 23484: Copying JobId 23495 started. 02-Dec 13:05 minbar-dir JobId 23484: Copying using JobId=23482 Job=MySQL_Backup_New.2019-12-02_04.55.00_45 02-Dec 13:05 asgard-sd JobId 23486: Wrote label to prelabeled Volume "ARCHIVE-20191202-13:05" on File device "ArchiveCopy" (/arcpool) 02-Dec 13:05 asgard-sd JobId 23485: Ready to read from volume "FULL-20191202-04:30" on File device "FileStorage" (/spool/bacula). Now, what this LOOKS like to me is that the Director is evaluating the selection function at the time that the job is SCHEDULED, NOT at the time that it actually RUNS, at which time none of the jobs it is supposed to archive have actually completed yet, and therefore it doesn't find anything to archive. Which seems to me to mean that I have to guess how long all the other jobs are going to take to run and not SCHEDULE the job until after they have all completed. Which is going to be a problem if the jobs to be archived take longer than expected to run, for any reason. (The Director crash is a separate problem on which I have still gotten precisely nowhere.) It seems to me like this is a bug, and the selection criteria for a COPY job should not be resolved until the COPY job is actually started. Any comments...? -- Phil Stracchino Babylon Communications ph...@caerllewys.net p...@co.ordinate.org Landline: +1.603.293.8485 Mobile: +1.603.998.6958 _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel