Hello,

I hope this is the right place to ask.

We have Bacula version 9.6.5. on a RHEL 8.2 physical server. We use copy job that runs every 6 months and copies the last full backup of 22 clients to a separate SCSI disk which is then pulled out from the server for offsite backup. So both the backup disks and disk on which the backups are later copied are local, physical disks in the machine. This was working fine every time up to now. During last run of the copy job, we noticed the following error on one of the copy jobs:

21-Oct 04:00 backup-dir JobId 12612: Copying using JobId=12068 Job=websrv.2021-10-03_23.05.00_12 21-Oct 05:45 backup-dir JobId 12612: Start Copying JobId 12612, Job=Copy-to-offsite-disk.2021-10-21_04.00.01_36 21-Oct 05:45 backup-dir JobId 12612: Using Device "backup-device-data1" to read. 21-Oct 05:45 backup-sd JobId 12612: Ready to read from volume "volume-10" on File device "backup-device-data1" (/data1). 21-Oct 05:45 backup-sd JobId 12612: Forward spacing Volume "volume-10" to addr=641829209227 *21-Oct 05:46 backup-sd JobId 12612: Warning: block.c:682 [SE0208] Volume data error at 0:0! Short block of 42967 bytes on device "backup-device-data1" (/data1) discarded. ** **21-Oct 05:46 backup-sd JobId 12612: Error: read_records.c:160 block.c:682 [SE0208] Volume data error at 0:0! Short block of 42967 bytes on device "backup-device-data1" (/data1) discarded. * 21-Oct 05:46 backup-sd JobId 12612: End of Volume "volume-10" at addr=653197996032 on device "backup-device-data1" (/data1). 21-Oct 05:46 backup-sd JobId 12612: Ready to read from volume "volume-11" on File device "backup-device-data1" (/data1). 21-Oct 05:46 backup-sd JobId 12612: Forward spacing Volume "volume-11" to addr=64742 21-Oct 05:50 backup-sd JobId 12612: End of Volume "volume-11" at addr=49666111453 on device "backup-device-data1" (/data1). 21-Oct 05:50 backup-sd JobId 12612: Elapsed time=00:05:02, Transfer rate=16.65 M Bytes/second 21-Oct 05:50 backup-dir JobId 12612: Error: Bacula backup-dir 9.6.5 (11Jun20):
 Build OS:               x86_64-redhat-linux-gnu-bacula redhat (Core)
 Prev Backup JobId:      12068
 Prev Backup Job:        websrv.2021-10-03_23.05.00_12
 New Backup JobId:       12613
 Current JobId:          12612
 Current Job:            Copy-to-offsite-disk.2021-10-21_04.00.01_36
 Backup Level:           Full
 Client:                 backup-fd
 FileSet:                "backup" 2020-09-17 21:53:08
 Read Pool:              "Default" (From Command input)
 Read Storage:           "backup-storage-data1" (From Pool resource)
 Write Pool:             "Offsite" (From Command input)
 Write Storage:          "backup-storage-data2" (From Command input)
 Catalog:                "MyCatalog" (From Client resource)
 Start time:             21-Oct-2021 05:45:32
 End time:               21-Oct-2021 05:50:42
 Elapsed time:           5 mins 10 secs
 Priority:               10
* SD Files Written:       200,038 **
** SD Bytes Written:       5,029,190,722 (5.029 GB) *
 Rate:                   16223.2 KB/s
 Volume name(s):         vol-01
 Volume Session Id:      5353
 Volume Session Time:    1619097903
 Last Volume Bytes:      573,628,844,941 (573.6 GB)
 SD Errors:              1
 SD termination status:  OK
 Termination:            *** Copying Error ***



Job ends with error, but upon checking the backup size and number of files of the original "websrv" backup job (which was successful), we see that the number of files and bytes written are the same as is in the copy job:

03-Oct 23:06 backup-dir JobId 12068: Start Backup JobId 12068, 
Job=websrv.2021-10-03_23.05.00_12
03-Oct 23:06 backup-dir JobId 12068: Using Device "backup-device-data1" to 
write.
03-Oct 23:06 websrv-fd JobId 12068: shell command: run ClientBeforeJob 
"/etc/bacula/scripts/mariadb-backup.sh"
03-Oct 23:10 websrv-fd JobId 12068:      /srv/cpap is a different filesystem. 
Will not descend from / into it.
03-Oct 23:19 backup-sd JobId 12068: Elapsed time=00:13:26, Transfer rate=6.239 
M Bytes/second
03-Oct 23:19 backup-sd JobId 12068: Sending spooled attrs to the Director. 
Despooling 53,043,158 bytes ...
03-Oct 23:19 backup-dir JobId 12068: Bacula backup-dir 9.6.5 (11Jun20):
  Build OS:               x86_64-redhat-linux-gnu-bacula redhat (Core)
  JobId:                  12068
  Job:                    websrv.2021-10-03_23.05.00_12
  Backup Level:           Full
  Client:                 "websrv-fd" 9.0.6 (20Nov17) 
x86_64-redhat-linux-gnu,redhat,Enterprise 8.1
  FileSet:                "websrv" 2020-09-17 23:05:00
  Pool:                   "Default" (From Job resource)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "backup-storage-data1" (From Pool resource)
  Scheduled time:         03-Oct-2021 23:05:00
  Start time:             03-Oct-2021 23:06:04
  End time:               03-Oct-2021 23:19:37
  Elapsed time:           13 mins 33 secs
  Priority:               10
*FD Files Written: 200,038 SD Files Written: 200,038 FD Bytes Written: 4,996,564,202 (4.996 GB) SD Bytes Written: 5,029,190,722 (5.029 GB)*
  Rate:                   6145.8 KB/s
  Software Compression:   40.9% 1.7:1
  Comm Line Compression:  16.7% 1.2:1
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               no
  Volume name(s):         volume-10|volume-11
  Volume Session Id:      4812
  Volume Session Time:    1619097903
  Last Volume Bytes:      49,996,799,959 (49.99 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Backup OK

03-Oct 23:19 backup-dir JobId 12068: Begin pruning Jobs older than 6 months .
03-Oct 23:19 backup-dir JobId 12068: Pruned 1 Job for client websrv-fd from 
catalog.
03-Oct 23:19 backup-dir JobId 12068: Begin pruning Files.
03-Oct 23:19 backup-dir JobId 12068: Pruned Files from 1 Jobs for client 
websrv-fd from catalog.
03-Oct 23:19 backup-dir JobId 12068: End auto prune.


I also don't see any errors in messages or dmesg output.

From what I gathered from an older thread: https://marc.info/?l=bacula-devel&m=123237957307571&w=2 <https://marc.info/?l=bacula-devel&m=123237957307571&w=2> , this points that copying completed successfully. Also mentioned in the thread is that "Maximium Block Size" could be the problem, but we haven't defined that property.

Does anyone know what could cause this?


BR

--
Marko Bartolić, Nimium d.o.o., email:<marko.barto...@nimium.hr>
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to