Re: [Bacula-devel] Copy job shows short block error

Martin Simmons Fri, 22 Oct 2021 08:06:44 -0700

Ah, so the short block is part of jobid 12065 and therefore the copy of jobid
12068 is very likely to be complete.


The "[SI0202] End of Volume" message means that write() wrote less that
expected.  The only reasons I can think of are disk space shortage, I/O error,
some other process-specific limit (such as controlled by ulimit -f) or buggy
filesystem.

Have you set some limit on the volume size?  It is interesting that those 3
volumes are all just slightly smaller that 653,198,000,000 bytes.

__Martin


>>>>> On Fri, 22 Oct 2021 13:53:12 +0200, Marko Bartolić said:
> 
> Hi,
> 
> the backup job on 03-Oct definetly filled the volume 10, but not the 
> whole disk, it still has some free space.
> 
> *list volume
> 
> Automatically selected Catalog: MyCatalog
> 
> Using Catalog "MyCatalog"
> 
> Pool: Default
> 
> +---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------+----------+---------------------+------------+
> 
> | mediaid | volumename | volstatus | enabled | volbytes        | volfiles | 
> volretention | recycle | slot | inchanger | mediatype | voltype | volparts | 
> lastwritten         | expiresin  |
> 
> +---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------+----------+---------------------+------------+
> 
> ...
> 
> |       9 | volume-09  | Full      |       1 | 653,197,959,321 |      152 |   
> 15,552,000 |       1 |    0 |         0 | data1     |       1 |        0 | 
> 2021-09-06 23:06:04 | 11,615,551 |
> |      10 | volume-10  | Full      |       1 | 653,197,888,553 |      152 |   
> 15,552,000 |       1 |    0 |         0 | data1     |       1 |        0 | 
> 2021-10-03 23:09:09 | 13,944,936 |
> |      11 | volume-11  | Full      |       1 | 653,197,958,518 |      152 |   
> 15,552,000 |       1 |    0 |         0 | data1     |       1 |        0 | 
> 2021-10-05 13:11:17 | 14,081,864 |
> 
> ...
> 
> 
> 
> /var/log/messages or dmesg do not show any problems on 03-Oct.
> 
> Yes, that is the complete output for jobid 12068 from bacula.log. But 
> there was another job running at approximately same time during which 
> the volume changed from 10 to 11:
> 
> 03-Oct 23:09 backup-sd JobId 12065: [SI0202] End of Volume "volume-10" at 
> 653197953065 on device "backup-device-data1" (/data1). Write of 64512 bytes 
> got 42967.
> 
> 03-Oct 23:09 backup-sd JobId 12065: End of medium on Volume "volume-10" 
> Bytes=653,197,888,553 Blocks=10,125,619 at 03-Oct-2021 23:09.
> 
> 03-Oct 23:09 backup-dir JobId 12065: There are no more Jobs associated with 
> Volume "volume-11". Marking it purged.
> 
> 03-Oct 23:09 backup-dir JobId 12065: All records pruned from Volume 
> "volume-11"; marking it "Purged"
> 
> 03-Oct 23:09 backup-dir JobId 12065: Recycled volume "volume-11"
> 
> 03-Oct 23:09 backup-sd JobId 12065: Recycled volume "volume-11" on File 
> device "backup-device-data1" (/data1), all previous data lost.
> 
> 03-Oct 23:09 backup-sd JobId 12065: New volume "volume-11" mounted on device 
> "backup-device-data1" (/data1) at 03-Oct-2021 23:09.
> 
> 
> 
> Marko Bartolić, Nimium d.o.o., email:<[email protected]>
> 
> On 22. 10. 2021. 13:18, Martin Simmons wrote:
> > Did the backup job on 03-Oct fill the disk?  Did you check for syslog 
> > messages
> > for that date too?
> >
> > Also, did you include the complete output below from the backup jobid 12068?
> > I would expect more information to be printed between 03-Oct 23:06 and 
> > 03-Oct
> > 23:19 about the volumes being used and the reason for changing from 
> > volume-10
> > to volume-11.
> >
> > __Martin
> >
> >
> >>>>>> On Thu, 21 Oct 2021 16:27:13 +0200, Marko Bartolić said:
> >> Hello,
> >>
> >> I hope this is the right place to ask.
> >>
> >> We have Bacula version 9.6.5. on a RHEL 8.2 physical server. We use copy
> >> job that runs every 6 months and copies the last full backup of 22
> >> clients to a separate SCSI disk which is then pulled out from the server
> >> for offsite backup. So both the backup disks and disk on which the
> >> backups are later copied are local, physical disks in the machine. This
> >> was working fine every time up to now. During last run of the copy job,
> >> we noticed the following error on one of the copy jobs:
> >>
> >> 21-Oct 04:00 backup-dir JobId 12612: Copying using JobId=12068 
> >> Job=websrv.2021-10-03_23.05.00_12
> >> 21-Oct 05:45 backup-dir JobId 12612: Start Copying JobId 12612, 
> >> Job=Copy-to-offsite-disk.2021-10-21_04.00.01_36
> >> 21-Oct 05:45 backup-dir JobId 12612: Using Device "backup-device-data1" to 
> >> read.
> >> 21-Oct 05:45 backup-sd JobId 12612: Ready to read from volume "volume-10" 
> >> on File device "backup-device-data1" (/data1).
> >> 21-Oct 05:45 backup-sd JobId 12612: Forward spacing Volume "volume-10" to 
> >> addr=641829209227
> >> *21-Oct 05:46 backup-sd JobId 12612: Warning: block.c:682 [SE0208] Volume 
> >> data error at 0:0! Short block of 42967 bytes on device 
> >> "backup-device-data1" (/data1) discarded. **
> >> **21-Oct 05:46 backup-sd JobId 12612: Error: read_records.c:160 
> >> block.c:682 [SE0208] Volume data error at 0:0! Short block of 42967 bytes 
> >> on device "backup-device-data1" (/data1) discarded. *
> >> 21-Oct 05:46 backup-sd JobId 12612: End of Volume "volume-10" at 
> >> addr=653197996032 on device "backup-device-data1" (/data1).
> >> 21-Oct 05:46 backup-sd JobId 12612: Ready to read from volume "volume-11" 
> >> on File device "backup-device-data1" (/data1).
> >> 21-Oct 05:46 backup-sd JobId 12612: Forward spacing Volume "volume-11" to 
> >> addr=64742
> >> 21-Oct 05:50 backup-sd JobId 12612: End of Volume "volume-11" at 
> >> addr=49666111453 on device "backup-device-data1" (/data1).
> >> 21-Oct 05:50 backup-sd JobId 12612: Elapsed time=00:05:02, Transfer 
> >> rate=16.65 M Bytes/second
> >> 21-Oct 05:50 backup-dir JobId 12612: Error: Bacula backup-dir 9.6.5 
> >> (11Jun20):
> >>    Build OS:               x86_64-redhat-linux-gnu-bacula redhat (Core)
> >>    Prev Backup JobId:      12068
> >>    Prev Backup Job:        websrv.2021-10-03_23.05.00_12
> >>    New Backup JobId:       12613
> >>    Current JobId:          12612
> >>    Current Job:            Copy-to-offsite-disk.2021-10-21_04.00.01_36
> >>    Backup Level:           Full
> >>    Client:                 backup-fd
> >>    FileSet:                "backup" 2020-09-17 21:53:08
> >>    Read Pool:              "Default" (From Command input)
> >>    Read Storage:           "backup-storage-data1" (From Pool resource)
> >>    Write Pool:             "Offsite" (From Command input)
> >>    Write Storage:          "backup-storage-data2" (From Command input)
> >>    Catalog:                "MyCatalog" (From Client resource)
> >>    Start time:             21-Oct-2021 05:45:32
> >>    End time:               21-Oct-2021 05:50:42
> >>    Elapsed time:           5 mins 10 secs
> >>    Priority:               10
> >> * SD Files Written:       200,038 **
> >> ** SD Bytes Written:       5,029,190,722 (5.029 GB) *
> >>    Rate:                   16223.2 KB/s
> >>    Volume name(s):         vol-01
> >>    Volume Session Id:      5353
> >>    Volume Session Time:    1619097903
> >>    Last Volume Bytes:      573,628,844,941 (573.6 GB)
> >>    SD Errors:              1
> >>    SD termination status:  OK
> >>    Termination:            *** Copying Error ***
> >>
> >>
> >>
> >> Job ends with error, but upon checking the backup size and number of
> >> files of the original "websrv" backup job (which was successful), we see
> >> that the number of files and bytes written are the same as is in the
> >> copy job:
> >>
> >> 03-Oct 23:06 backup-dir JobId 12068: Start Backup JobId 12068, 
> >> Job=websrv.2021-10-03_23.05.00_12
> >> 03-Oct 23:06 backup-dir JobId 12068: Using Device "backup-device-data1" to 
> >> write.
> >> 03-Oct 23:06 websrv-fd JobId 12068: shell command: run ClientBeforeJob 
> >> "/etc/bacula/scripts/mariadb-backup.sh"
> >> 03-Oct 23:10 websrv-fd JobId 12068:      /srv/cpap is a different 
> >> filesystem. Will not descend from / into it.
> >> 03-Oct 23:19 backup-sd JobId 12068: Elapsed time=00:13:26, Transfer 
> >> rate=6.239 M Bytes/second
> >> 03-Oct 23:19 backup-sd JobId 12068: Sending spooled attrs to the Director. 
> >> Despooling 53,043,158 bytes ...
> >> 03-Oct 23:19 backup-dir JobId 12068: Bacula backup-dir 9.6.5 (11Jun20):
> >>     Build OS:               x86_64-redhat-linux-gnu-bacula redhat (Core)
> >>     JobId:                  12068
> >>     Job:                    websrv.2021-10-03_23.05.00_12
> >>     Backup Level:           Full
> >>     Client:                 "websrv-fd" 9.0.6 (20Nov17) 
> >> x86_64-redhat-linux-gnu,redhat,Enterprise 8.1
> >>     FileSet:                "websrv" 2020-09-17 23:05:00
> >>     Pool:                   "Default" (From Job resource)
> >>     Catalog:                "MyCatalog" (From Client resource)
> >>     Storage:                "backup-storage-data1" (From Pool resource)
> >>     Scheduled time:         03-Oct-2021 23:05:00
> >>     Start time:             03-Oct-2021 23:06:04
> >>     End time:               03-Oct-2021 23:19:37
> >>     Elapsed time:           13 mins 33 secs
> >>     Priority:               10
> >> *FD Files Written: 200,038 SD Files Written: 200,038 FD Bytes Written:
> >> 4,996,564,202 (4.996 GB) SD Bytes Written: 5,029,190,722 (5.029 GB)*
> >>     Rate:                   6145.8 KB/s
> >>     Software Compression:   40.9% 1.7:1
> >>     Comm Line Compression:  16.7% 1.2:1
> >>     Snapshot/VSS:           no
> >>     Encryption:             no
> >>     Accurate:               no
> >>     Volume name(s):         volume-10|volume-11
> >>     Volume Session Id:      4812
> >>     Volume Session Time:    1619097903
> >>     Last Volume Bytes:      49,996,799,959 (49.99 GB)
> >>     Non-fatal FD errors:    0
> >>     SD Errors:              0
> >>     FD termination status:  OK
> >>     SD termination status:  OK
> >>     Termination:            Backup OK
> >>
> >> 03-Oct 23:19 backup-dir JobId 12068: Begin pruning Jobs older than 6 
> >> months .
> >> 03-Oct 23:19 backup-dir JobId 12068: Pruned 1 Job for client websrv-fd 
> >> from catalog.
> >> 03-Oct 23:19 backup-dir JobId 12068: Begin pruning Files.
> >> 03-Oct 23:19 backup-dir JobId 12068: Pruned Files from 1 Jobs for client 
> >> websrv-fd from catalog.
> >> 03-Oct 23:19 backup-dir JobId 12068: End auto prune.
> >>
> >>
> >> I also don't see any errors in messages or dmesg output.
> >>
> >>   From what I gathered from an older thread:
> >> https://marc.info/?l=bacula-devel&m=123237957307571&w=2  
> >> <https://marc.info/?l=bacula-devel&m=123237957307571&w=2>  , this points
> >> that copying completed successfully. Also mentioned in the thread is
> >> that "Maximium Block Size" could be the problem, but we haven't defined
> >> that property.
> >>
> >> Does anyone know what could cause this?
> >>
> >>
> >> BR
> >>
> >> -- 
> >> Marko Bartolić, Nimium d.o.o., email:<[email protected]>
> >>
> 


_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Copy job shows short block error

Reply via email to