Re: [Bacula-devel] Copy job shows short block error

Martin Simmons Mon, 25 Oct 2021 10:13:26 -0700

That disk space should be OK as long as you don't have filesystem freespace
limits (such as -m reserved-blocks-percentage in tune2fs) and that TB is
1099511627776 bytes and not the disk manufacturers' 1000000000000 bytes.


OTOH, I don't see how the code can print the SI0202 message unless write()
failed, so I still think filling the disk is the most likely reason.

__Martin


>>>>> On Mon, 25 Oct 2021 12:50:41 +0200, Marko Bartolić said:
> 
> Yes, the following is set on the pool to keep the volumes manageable:
> 
>    Maximum Volume Bytes = 653197960000
>    Maximum Volumes = 18
> 
> There are three 3.7 TB disks in that pool, I reckon this should keep the 
> disks from completely filling up.
> 
> Marko Bartolić, Nimium d.o.o., email: <[email protected]>
> Mob: +385 99 325 7180
> Zagrebačka cesta 145A/I, 10 000 Zagreb, Hrvatska
> 
> On 22. 10. 2021. 17:06, Martin Simmons wrote:
> > Ah, so the short block is part of jobid 12065 and therefore the copy of 
> > jobid
> > 12068 is very likely to be complete.
> >
> > The "[SI0202] End of Volume" message means that write() wrote less that
> > expected.  The only reasons I can think of are disk space shortage, I/O 
> > error,
> > some other process-specific limit (such as controlled by ulimit -f) or buggy
> > filesystem.
> >
> > Have you set some limit on the volume size?  It is interesting that those 3
> > volumes are all just slightly smaller that 653,198,000,000 bytes.
> >
> > __Martin
> >
> >
> >>>>>> On Fri, 22 Oct 2021 13:53:12 +0200, Marko Bartolić said:
> >> Hi,
> >>
> >> the backup job on 03-Oct definetly filled the volume 10, but not the
> >> whole disk, it still has some free space.
> >>
> >> *list volume
> >>
> >> Automatically selected Catalog: MyCatalog
> >>
> >> Using Catalog "MyCatalog"
> >>
> >> Pool: Default
> >>
> >> +---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------+----------+---------------------+------------+
> >>
> >> | mediaid | volumename | volstatus | enabled | volbytes        | volfiles 
> >> | volretention | recycle | slot | inchanger | mediatype | voltype | 
> >> volparts | lastwritten         | expiresin  |
> >>
> >> +---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------+----------+---------------------+------------+
> >>
> >> ...
> >>
> >> |       9 | volume-09  | Full      |       1 | 653,197,959,321 |      152 
> >> |   15,552,000 |       1 |    0 |         0 | data1     |       1 |        
> >> 0 | 2021-09-06 23:06:04 | 11,615,551 |
> >> |      10 | volume-10  | Full      |       1 | 653,197,888,553 |      152 
> >> |   15,552,000 |       1 |    0 |         0 | data1     |       1 |        
> >> 0 | 2021-10-03 23:09:09 | 13,944,936 |
> >> |      11 | volume-11  | Full      |       1 | 653,197,958,518 |      152 
> >> |   15,552,000 |       1 |    0 |         0 | data1     |       1 |        
> >> 0 | 2021-10-05 13:11:17 | 14,081,864 |
> >>
> >> ...
> >>
> >>
> >>
> >> /var/log/messages or dmesg do not show any problems on 03-Oct.
> >>
> >> Yes, that is the complete output for jobid 12068 from bacula.log. But
> >> there was another job running at approximately same time during which
> >> the volume changed from 10 to 11:
> >>
> >> 03-Oct 23:09 backup-sd JobId 12065: [SI0202] End of Volume "volume-10" at 
> >> 653197953065 on device "backup-device-data1" (/data1). Write of 64512 
> >> bytes got 42967.
> >>
> >> 03-Oct 23:09 backup-sd JobId 12065: End of medium on Volume "volume-10" 
> >> Bytes=653,197,888,553 Blocks=10,125,619 at 03-Oct-2021 23:09.
> >>
> >> 03-Oct 23:09 backup-dir JobId 12065: There are no more Jobs associated 
> >> with Volume "volume-11". Marking it purged.
> >>
> >> 03-Oct 23:09 backup-dir JobId 12065: All records pruned from Volume 
> >> "volume-11"; marking it "Purged"
> >>
> >> 03-Oct 23:09 backup-dir JobId 12065: Recycled volume "volume-11"
> >>
> >> 03-Oct 23:09 backup-sd JobId 12065: Recycled volume "volume-11" on File 
> >> device "backup-device-data1" (/data1), all previous data lost.
> >>
> >> 03-Oct 23:09 backup-sd JobId 12065: New volume "volume-11" mounted on 
> >> device "backup-device-data1" (/data1) at 03-Oct-2021 23:09.
> >>
> >>
> >>
> >> Marko Bartolić, Nimium d.o.o., email:<[email protected]>
> >>
> >> On 22. 10. 2021. 13:18, Martin Simmons wrote:
> >>> Did the backup job on 03-Oct fill the disk?  Did you check for syslog 
> >>> messages
> >>> for that date too?
> >>>
> >>> Also, did you include the complete output below from the backup jobid 
> >>> 12068?
> >>> I would expect more information to be printed between 03-Oct 23:06 and 
> >>> 03-Oct
> >>> 23:19 about the volumes being used and the reason for changing from 
> >>> volume-10
> >>> to volume-11.
> >>>
> >>> __Martin
> >>>
> >>>
> >>>>>>>> On Thu, 21 Oct 2021 16:27:13 +0200, Marko Bartolić said:
> >>>> Hello,
> >>>>
> >>>> I hope this is the right place to ask.
> >>>>
> >>>> We have Bacula version 9.6.5. on a RHEL 8.2 physical server. We use copy
> >>>> job that runs every 6 months and copies the last full backup of 22
> >>>> clients to a separate SCSI disk which is then pulled out from the server
> >>>> for offsite backup. So both the backup disks and disk on which the
> >>>> backups are later copied are local, physical disks in the machine. This
> >>>> was working fine every time up to now. During last run of the copy job,
> >>>> we noticed the following error on one of the copy jobs:
> >>>>
> >>>> 21-Oct 04:00 backup-dir JobId 12612: Copying using JobId=12068 
> >>>> Job=websrv.2021-10-03_23.05.00_12
> >>>> 21-Oct 05:45 backup-dir JobId 12612: Start Copying JobId 12612, 
> >>>> Job=Copy-to-offsite-disk.2021-10-21_04.00.01_36
> >>>> 21-Oct 05:45 backup-dir JobId 12612: Using Device "backup-device-data1" 
> >>>> to read.
> >>>> 21-Oct 05:45 backup-sd JobId 12612: Ready to read from volume 
> >>>> "volume-10" on File device "backup-device-data1" (/data1).
> >>>> 21-Oct 05:45 backup-sd JobId 12612: Forward spacing Volume "volume-10" 
> >>>> to addr=641829209227
> >>>> *21-Oct 05:46 backup-sd JobId 12612: Warning: block.c:682 [SE0208] 
> >>>> Volume data error at 0:0! Short block of 42967 bytes on device 
> >>>> "backup-device-data1" (/data1) discarded. **
> >>>> **21-Oct 05:46 backup-sd JobId 12612: Error: read_records.c:160 
> >>>> block.c:682 [SE0208] Volume data error at 0:0! Short block of 42967 
> >>>> bytes on device "backup-device-data1" (/data1) discarded. *
> >>>> 21-Oct 05:46 backup-sd JobId 12612: End of Volume "volume-10" at 
> >>>> addr=653197996032 on device "backup-device-data1" (/data1).
> >>>> 21-Oct 05:46 backup-sd JobId 12612: Ready to read from volume 
> >>>> "volume-11" on File device "backup-device-data1" (/data1).
> >>>> 21-Oct 05:46 backup-sd JobId 12612: Forward spacing Volume "volume-11" 
> >>>> to addr=64742
> >>>> 21-Oct 05:50 backup-sd JobId 12612: End of Volume "volume-11" at 
> >>>> addr=49666111453 on device "backup-device-data1" (/data1).
> >>>> 21-Oct 05:50 backup-sd JobId 12612: Elapsed time=00:05:02, Transfer 
> >>>> rate=16.65 M Bytes/second
> >>>> 21-Oct 05:50 backup-dir JobId 12612: Error: Bacula backup-dir 9.6.5 
> >>>> (11Jun20):
> >>>>     Build OS:               x86_64-redhat-linux-gnu-bacula redhat (Core)
> >>>>     Prev Backup JobId:      12068
> >>>>     Prev Backup Job:        websrv.2021-10-03_23.05.00_12
> >>>>     New Backup JobId:       12613
> >>>>     Current JobId:          12612
> >>>>     Current Job:            Copy-to-offsite-disk.2021-10-21_04.00.01_36
> >>>>     Backup Level:           Full
> >>>>     Client:                 backup-fd
> >>>>     FileSet:                "backup" 2020-09-17 21:53:08
> >>>>     Read Pool:              "Default" (From Command input)
> >>>>     Read Storage:           "backup-storage-data1" (From Pool resource)
> >>>>     Write Pool:             "Offsite" (From Command input)
> >>>>     Write Storage:          "backup-storage-data2" (From Command input)
> >>>>     Catalog:                "MyCatalog" (From Client resource)
> >>>>     Start time:             21-Oct-2021 05:45:32
> >>>>     End time:               21-Oct-2021 05:50:42
> >>>>     Elapsed time:           5 mins 10 secs
> >>>>     Priority:               10
> >>>> * SD Files Written:       200,038 **
> >>>> ** SD Bytes Written:       5,029,190,722 (5.029 GB) *
> >>>>     Rate:                   16223.2 KB/s
> >>>>     Volume name(s):         vol-01
> >>>>     Volume Session Id:      5353
> >>>>     Volume Session Time:    1619097903
> >>>>     Last Volume Bytes:      573,628,844,941 (573.6 GB)
> >>>>     SD Errors:              1
> >>>>     SD termination status:  OK
> >>>>     Termination:            *** Copying Error ***
> >>>>
> >>>>
> >>>>
> >>>> Job ends with error, but upon checking the backup size and number of
> >>>> files of the original "websrv" backup job (which was successful), we see
> >>>> that the number of files and bytes written are the same as is in the
> >>>> copy job:
> >>>>
> >>>> 03-Oct 23:06 backup-dir JobId 12068: Start Backup JobId 12068, 
> >>>> Job=websrv.2021-10-03_23.05.00_12
> >>>> 03-Oct 23:06 backup-dir JobId 12068: Using Device "backup-device-data1" 
> >>>> to write.
> >>>> 03-Oct 23:06 websrv-fd JobId 12068: shell command: run ClientBeforeJob 
> >>>> "/etc/bacula/scripts/mariadb-backup.sh"
> >>>> 03-Oct 23:10 websrv-fd JobId 12068:      /srv/cpap is a different 
> >>>> filesystem. Will not descend from / into it.
> >>>> 03-Oct 23:19 backup-sd JobId 12068: Elapsed time=00:13:26, Transfer 
> >>>> rate=6.239 M Bytes/second
> >>>> 03-Oct 23:19 backup-sd JobId 12068: Sending spooled attrs to the 
> >>>> Director. Despooling 53,043,158 bytes ...
> >>>> 03-Oct 23:19 backup-dir JobId 12068: Bacula backup-dir 9.6.5 (11Jun20):
> >>>>      Build OS:               x86_64-redhat-linux-gnu-bacula redhat (Core)
> >>>>      JobId:                  12068
> >>>>      Job:                    websrv.2021-10-03_23.05.00_12
> >>>>      Backup Level:           Full
> >>>>      Client:                 "websrv-fd" 9.0.6 (20Nov17) 
> >>>> x86_64-redhat-linux-gnu,redhat,Enterprise 8.1
> >>>>      FileSet:                "websrv" 2020-09-17 23:05:00
> >>>>      Pool:                   "Default" (From Job resource)
> >>>>      Catalog:                "MyCatalog" (From Client resource)
> >>>>      Storage:                "backup-storage-data1" (From Pool resource)
> >>>>      Scheduled time:         03-Oct-2021 23:05:00
> >>>>      Start time:             03-Oct-2021 23:06:04
> >>>>      End time:               03-Oct-2021 23:19:37
> >>>>      Elapsed time:           13 mins 33 secs
> >>>>      Priority:               10
> >>>> *FD Files Written: 200,038 SD Files Written: 200,038 FD Bytes Written:
> >>>> 4,996,564,202 (4.996 GB) SD Bytes Written: 5,029,190,722 (5.029 GB)*
> >>>>      Rate:                   6145.8 KB/s
> >>>>      Software Compression:   40.9% 1.7:1
> >>>>      Comm Line Compression:  16.7% 1.2:1
> >>>>      Snapshot/VSS:           no
> >>>>      Encryption:             no
> >>>>      Accurate:               no
> >>>>      Volume name(s):         volume-10|volume-11
> >>>>      Volume Session Id:      4812
> >>>>      Volume Session Time:    1619097903
> >>>>      Last Volume Bytes:      49,996,799,959 (49.99 GB)
> >>>>      Non-fatal FD errors:    0
> >>>>      SD Errors:              0
> >>>>      FD termination status:  OK
> >>>>      SD termination status:  OK
> >>>>      Termination:            Backup OK
> >>>>
> >>>> 03-Oct 23:19 backup-dir JobId 12068: Begin pruning Jobs older than 6 
> >>>> months .
> >>>> 03-Oct 23:19 backup-dir JobId 12068: Pruned 1 Job for client websrv-fd 
> >>>> from catalog.
> >>>> 03-Oct 23:19 backup-dir JobId 12068: Begin pruning Files.
> >>>> 03-Oct 23:19 backup-dir JobId 12068: Pruned Files from 1 Jobs for client 
> >>>> websrv-fd from catalog.
> >>>> 03-Oct 23:19 backup-dir JobId 12068: End auto prune.
> >>>>
> >>>>
> >>>> I also don't see any errors in messages or dmesg output.
> >>>>
> >>>>    From what I gathered from an older thread:
> >>>> https://marc.info/?l=bacula-devel&m=123237957307571&w=2
> >>>> <https://marc.info/?l=bacula-devel&m=123237957307571&w=2>  , this points
> >>>> that copying completed successfully. Also mentioned in the thread is
> >>>> that "Maximium Block Size" could be the problem, but we haven't defined
> >>>> that property.
> >>>>
> >>>> Does anyone know what could cause this?
> >>>>
> >>>>
> >>>> BR
> >>>>
> >>>> -- 
> >>>> Marko Bartolić, Nimium d.o.o., email:<[email protected]>
> >>>>
> 
> 
> 
> 
> _______________________________________________
> Bacula-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
> 


_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Copy job shows short block error

Reply via email to