Hello,
I hope this is the right place to ask.
We have Bacula version 9.6.5. on a RHEL 8.2 physical server. We use copy
job that runs every 6 months and copies the last full backup of 22
clients to a separate SCSI disk which is then pulled out from the server
for offsite backup. So both the backup disks and disk on which the
backups are later copied are local, physical disks in the machine. This
was working fine every time up to now. During last run of the copy job,
we noticed the following error on one of the copy jobs:
21-Oct 04:00 backup-dir JobId 12612: Copying using JobId=12068
Job=websrv.2021-10-03_23.05.00_12
21-Oct 05:45 backup-dir JobId 12612: Start Copying JobId 12612,
Job=Copy-to-offsite-disk.2021-10-21_04.00.01_36
21-Oct 05:45 backup-dir JobId 12612: Using Device "backup-device-data1"
to read.
21-Oct 05:45 backup-sd JobId 12612: Ready to read from volume
"volume-10" on File device "backup-device-data1" (/data1).
21-Oct 05:45 backup-sd JobId 12612: Forward spacing Volume "volume-10"
to addr=641829209227
*21-Oct 05:46 backup-sd JobId 12612: Warning: block.c:682 [SE0208]
Volume data error at 0:0! Short block of 42967 bytes on device
"backup-device-data1" (/data1) discarded. **
**21-Oct 05:46 backup-sd JobId 12612: Error: read_records.c:160
block.c:682 [SE0208] Volume data error at 0:0! Short block of 42967
bytes on device "backup-device-data1" (/data1) discarded. *
21-Oct 05:46 backup-sd JobId 12612: End of Volume "volume-10" at
addr=653197996032 on device "backup-device-data1" (/data1).
21-Oct 05:46 backup-sd JobId 12612: Ready to read from volume
"volume-11" on File device "backup-device-data1" (/data1).
21-Oct 05:46 backup-sd JobId 12612: Forward spacing Volume "volume-11"
to addr=64742
21-Oct 05:50 backup-sd JobId 12612: End of Volume "volume-11" at
addr=49666111453 on device "backup-device-data1" (/data1).
21-Oct 05:50 backup-sd JobId 12612: Elapsed time=00:05:02, Transfer
rate=16.65 M Bytes/second
21-Oct 05:50 backup-dir JobId 12612: Error: Bacula backup-dir 9.6.5
(11Jun20):
Build OS: x86_64-redhat-linux-gnu-bacula redhat (Core)
Prev Backup JobId: 12068
Prev Backup Job: websrv.2021-10-03_23.05.00_12
New Backup JobId: 12613
Current JobId: 12612
Current Job: Copy-to-offsite-disk.2021-10-21_04.00.01_36
Backup Level: Full
Client: backup-fd
FileSet: "backup" 2020-09-17 21:53:08
Read Pool: "Default" (From Command input)
Read Storage: "backup-storage-data1" (From Pool resource)
Write Pool: "Offsite" (From Command input)
Write Storage: "backup-storage-data2" (From Command input)
Catalog: "MyCatalog" (From Client resource)
Start time: 21-Oct-2021 05:45:32
End time: 21-Oct-2021 05:50:42
Elapsed time: 5 mins 10 secs
Priority: 10
* SD Files Written: 200,038 **
** SD Bytes Written: 5,029,190,722 (5.029 GB) *
Rate: 16223.2 KB/s
Volume name(s): vol-01
Volume Session Id: 5353
Volume Session Time: 1619097903
Last Volume Bytes: 573,628,844,941 (573.6 GB)
SD Errors: 1
SD termination status: OK
Termination: *** Copying Error ***
Job ends with error, but upon checking the backup size and number of
files of the original "websrv" backup job (which was successful), we see
that the number of files and bytes written are the same as is in the
copy job:
03-Oct 23:06 backup-dir JobId 12068: Start Backup JobId 12068,
Job=websrv.2021-10-03_23.05.00_12
03-Oct 23:06 backup-dir JobId 12068: Using Device "backup-device-data1" to
write.
03-Oct 23:06 websrv-fd JobId 12068: shell command: run ClientBeforeJob
"/etc/bacula/scripts/mariadb-backup.sh"
03-Oct 23:10 websrv-fd JobId 12068: /srv/cpap is a different filesystem.
Will not descend from / into it.
03-Oct 23:19 backup-sd JobId 12068: Elapsed time=00:13:26, Transfer rate=6.239
M Bytes/second
03-Oct 23:19 backup-sd JobId 12068: Sending spooled attrs to the Director.
Despooling 53,043,158 bytes ...
03-Oct 23:19 backup-dir JobId 12068: Bacula backup-dir 9.6.5 (11Jun20):
Build OS: x86_64-redhat-linux-gnu-bacula redhat (Core)
JobId: 12068
Job: websrv.2021-10-03_23.05.00_12
Backup Level: Full
Client: "websrv-fd" 9.0.6 (20Nov17)
x86_64-redhat-linux-gnu,redhat,Enterprise 8.1
FileSet: "websrv" 2020-09-17 23:05:00
Pool: "Default" (From Job resource)
Catalog: "MyCatalog" (From Client resource)
Storage: "backup-storage-data1" (From Pool resource)
Scheduled time: 03-Oct-2021 23:05:00
Start time: 03-Oct-2021 23:06:04
End time: 03-Oct-2021 23:19:37
Elapsed time: 13 mins 33 secs
Priority: 10
*FD Files Written: 200,038 SD Files Written: 200,038 FD Bytes Written:
4,996,564,202 (4.996 GB) SD Bytes Written: 5,029,190,722 (5.029 GB)*
Rate: 6145.8 KB/s
Software Compression: 40.9% 1.7:1
Comm Line Compression: 16.7% 1.2:1
Snapshot/VSS: no
Encryption: no
Accurate: no
Volume name(s): volume-10|volume-11
Volume Session Id: 4812
Volume Session Time: 1619097903
Last Volume Bytes: 49,996,799,959 (49.99 GB)
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: OK
SD termination status: OK
Termination: Backup OK
03-Oct 23:19 backup-dir JobId 12068: Begin pruning Jobs older than 6 months .
03-Oct 23:19 backup-dir JobId 12068: Pruned 1 Job for client websrv-fd from
catalog.
03-Oct 23:19 backup-dir JobId 12068: Begin pruning Files.
03-Oct 23:19 backup-dir JobId 12068: Pruned Files from 1 Jobs for client
websrv-fd from catalog.
03-Oct 23:19 backup-dir JobId 12068: End auto prune.
I also don't see any errors in messages or dmesg output.
From what I gathered from an older thread:
https://marc.info/?l=bacula-devel&m=123237957307571&w=2
<https://marc.info/?l=bacula-devel&m=123237957307571&w=2> , this points
that copying completed successfully. Also mentioned in the thread is
that "Maximium Block Size" could be the problem, but we haven't defined
that property.
Does anyone know what could cause this?
BR
--
Marko Bartolić, Nimium d.o.o., email:<marko.barto...@nimium.hr>
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel