That disk space should be OK as long as you don't have filesystem freespace limits (such as -m reserved-blocks-percentage in tune2fs) and that TB is 1099511627776 bytes and not the disk manufacturers' 1000000000000 bytes.
OTOH, I don't see how the code can print the SI0202 message unless write() failed, so I still think filling the disk is the most likely reason. __Martin >>>>> On Mon, 25 Oct 2021 12:50:41 +0200, Marko Bartolić said: > > Yes, the following is set on the pool to keep the volumes manageable: > > Maximum Volume Bytes = 653197960000 > Maximum Volumes = 18 > > There are three 3.7 TB disks in that pool, I reckon this should keep the > disks from completely filling up. > > Marko Bartolić, Nimium d.o.o., email: <marko.barto...@nimium.hr> > Mob: +385 99 325 7180 > Zagrebačka cesta 145A/I, 10 000 Zagreb, Hrvatska > > On 22. 10. 2021. 17:06, Martin Simmons wrote: > > Ah, so the short block is part of jobid 12065 and therefore the copy of > > jobid > > 12068 is very likely to be complete. > > > > The "[SI0202] End of Volume" message means that write() wrote less that > > expected. The only reasons I can think of are disk space shortage, I/O > > error, > > some other process-specific limit (such as controlled by ulimit -f) or buggy > > filesystem. > > > > Have you set some limit on the volume size? It is interesting that those 3 > > volumes are all just slightly smaller that 653,198,000,000 bytes. > > > > __Martin > > > > > >>>>>> On Fri, 22 Oct 2021 13:53:12 +0200, Marko Bartolić said: > >> Hi, > >> > >> the backup job on 03-Oct definetly filled the volume 10, but not the > >> whole disk, it still has some free space. > >> > >> *list volume > >> > >> Automatically selected Catalog: MyCatalog > >> > >> Using Catalog "MyCatalog" > >> > >> Pool: Default > >> > >> +---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------+----------+---------------------+------------+ > >> > >> | mediaid | volumename | volstatus | enabled | volbytes | volfiles > >> | volretention | recycle | slot | inchanger | mediatype | voltype | > >> volparts | lastwritten | expiresin | > >> > >> +---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------+----------+---------------------+------------+ > >> > >> ... > >> > >> | 9 | volume-09 | Full | 1 | 653,197,959,321 | 152 > >> | 15,552,000 | 1 | 0 | 0 | data1 | 1 | > >> 0 | 2021-09-06 23:06:04 | 11,615,551 | > >> | 10 | volume-10 | Full | 1 | 653,197,888,553 | 152 > >> | 15,552,000 | 1 | 0 | 0 | data1 | 1 | > >> 0 | 2021-10-03 23:09:09 | 13,944,936 | > >> | 11 | volume-11 | Full | 1 | 653,197,958,518 | 152 > >> | 15,552,000 | 1 | 0 | 0 | data1 | 1 | > >> 0 | 2021-10-05 13:11:17 | 14,081,864 | > >> > >> ... > >> > >> > >> > >> /var/log/messages or dmesg do not show any problems on 03-Oct. > >> > >> Yes, that is the complete output for jobid 12068 from bacula.log. But > >> there was another job running at approximately same time during which > >> the volume changed from 10 to 11: > >> > >> 03-Oct 23:09 backup-sd JobId 12065: [SI0202] End of Volume "volume-10" at > >> 653197953065 on device "backup-device-data1" (/data1). Write of 64512 > >> bytes got 42967. > >> > >> 03-Oct 23:09 backup-sd JobId 12065: End of medium on Volume "volume-10" > >> Bytes=653,197,888,553 Blocks=10,125,619 at 03-Oct-2021 23:09. > >> > >> 03-Oct 23:09 backup-dir JobId 12065: There are no more Jobs associated > >> with Volume "volume-11". Marking it purged. > >> > >> 03-Oct 23:09 backup-dir JobId 12065: All records pruned from Volume > >> "volume-11"; marking it "Purged" > >> > >> 03-Oct 23:09 backup-dir JobId 12065: Recycled volume "volume-11" > >> > >> 03-Oct 23:09 backup-sd JobId 12065: Recycled volume "volume-11" on File > >> device "backup-device-data1" (/data1), all previous data lost. > >> > >> 03-Oct 23:09 backup-sd JobId 12065: New volume "volume-11" mounted on > >> device "backup-device-data1" (/data1) at 03-Oct-2021 23:09. > >> > >> > >> > >> Marko Bartolić, Nimium d.o.o., email:<marko.barto...@nimium.hr> > >> > >> On 22. 10. 2021. 13:18, Martin Simmons wrote: > >>> Did the backup job on 03-Oct fill the disk? Did you check for syslog > >>> messages > >>> for that date too? > >>> > >>> Also, did you include the complete output below from the backup jobid > >>> 12068? > >>> I would expect more information to be printed between 03-Oct 23:06 and > >>> 03-Oct > >>> 23:19 about the volumes being used and the reason for changing from > >>> volume-10 > >>> to volume-11. > >>> > >>> __Martin > >>> > >>> > >>>>>>>> On Thu, 21 Oct 2021 16:27:13 +0200, Marko Bartolić said: > >>>> Hello, > >>>> > >>>> I hope this is the right place to ask. > >>>> > >>>> We have Bacula version 9.6.5. on a RHEL 8.2 physical server. We use copy > >>>> job that runs every 6 months and copies the last full backup of 22 > >>>> clients to a separate SCSI disk which is then pulled out from the server > >>>> for offsite backup. So both the backup disks and disk on which the > >>>> backups are later copied are local, physical disks in the machine. This > >>>> was working fine every time up to now. During last run of the copy job, > >>>> we noticed the following error on one of the copy jobs: > >>>> > >>>> 21-Oct 04:00 backup-dir JobId 12612: Copying using JobId=12068 > >>>> Job=websrv.2021-10-03_23.05.00_12 > >>>> 21-Oct 05:45 backup-dir JobId 12612: Start Copying JobId 12612, > >>>> Job=Copy-to-offsite-disk.2021-10-21_04.00.01_36 > >>>> 21-Oct 05:45 backup-dir JobId 12612: Using Device "backup-device-data1" > >>>> to read. > >>>> 21-Oct 05:45 backup-sd JobId 12612: Ready to read from volume > >>>> "volume-10" on File device "backup-device-data1" (/data1). > >>>> 21-Oct 05:45 backup-sd JobId 12612: Forward spacing Volume "volume-10" > >>>> to addr=641829209227 > >>>> *21-Oct 05:46 backup-sd JobId 12612: Warning: block.c:682 [SE0208] > >>>> Volume data error at 0:0! Short block of 42967 bytes on device > >>>> "backup-device-data1" (/data1) discarded. ** > >>>> **21-Oct 05:46 backup-sd JobId 12612: Error: read_records.c:160 > >>>> block.c:682 [SE0208] Volume data error at 0:0! Short block of 42967 > >>>> bytes on device "backup-device-data1" (/data1) discarded. * > >>>> 21-Oct 05:46 backup-sd JobId 12612: End of Volume "volume-10" at > >>>> addr=653197996032 on device "backup-device-data1" (/data1). > >>>> 21-Oct 05:46 backup-sd JobId 12612: Ready to read from volume > >>>> "volume-11" on File device "backup-device-data1" (/data1). > >>>> 21-Oct 05:46 backup-sd JobId 12612: Forward spacing Volume "volume-11" > >>>> to addr=64742 > >>>> 21-Oct 05:50 backup-sd JobId 12612: End of Volume "volume-11" at > >>>> addr=49666111453 on device "backup-device-data1" (/data1). > >>>> 21-Oct 05:50 backup-sd JobId 12612: Elapsed time=00:05:02, Transfer > >>>> rate=16.65 M Bytes/second > >>>> 21-Oct 05:50 backup-dir JobId 12612: Error: Bacula backup-dir 9.6.5 > >>>> (11Jun20): > >>>> Build OS: x86_64-redhat-linux-gnu-bacula redhat (Core) > >>>> Prev Backup JobId: 12068 > >>>> Prev Backup Job: websrv.2021-10-03_23.05.00_12 > >>>> New Backup JobId: 12613 > >>>> Current JobId: 12612 > >>>> Current Job: Copy-to-offsite-disk.2021-10-21_04.00.01_36 > >>>> Backup Level: Full > >>>> Client: backup-fd > >>>> FileSet: "backup" 2020-09-17 21:53:08 > >>>> Read Pool: "Default" (From Command input) > >>>> Read Storage: "backup-storage-data1" (From Pool resource) > >>>> Write Pool: "Offsite" (From Command input) > >>>> Write Storage: "backup-storage-data2" (From Command input) > >>>> Catalog: "MyCatalog" (From Client resource) > >>>> Start time: 21-Oct-2021 05:45:32 > >>>> End time: 21-Oct-2021 05:50:42 > >>>> Elapsed time: 5 mins 10 secs > >>>> Priority: 10 > >>>> * SD Files Written: 200,038 ** > >>>> ** SD Bytes Written: 5,029,190,722 (5.029 GB) * > >>>> Rate: 16223.2 KB/s > >>>> Volume name(s): vol-01 > >>>> Volume Session Id: 5353 > >>>> Volume Session Time: 1619097903 > >>>> Last Volume Bytes: 573,628,844,941 (573.6 GB) > >>>> SD Errors: 1 > >>>> SD termination status: OK > >>>> Termination: *** Copying Error *** > >>>> > >>>> > >>>> > >>>> Job ends with error, but upon checking the backup size and number of > >>>> files of the original "websrv" backup job (which was successful), we see > >>>> that the number of files and bytes written are the same as is in the > >>>> copy job: > >>>> > >>>> 03-Oct 23:06 backup-dir JobId 12068: Start Backup JobId 12068, > >>>> Job=websrv.2021-10-03_23.05.00_12 > >>>> 03-Oct 23:06 backup-dir JobId 12068: Using Device "backup-device-data1" > >>>> to write. > >>>> 03-Oct 23:06 websrv-fd JobId 12068: shell command: run ClientBeforeJob > >>>> "/etc/bacula/scripts/mariadb-backup.sh" > >>>> 03-Oct 23:10 websrv-fd JobId 12068: /srv/cpap is a different > >>>> filesystem. Will not descend from / into it. > >>>> 03-Oct 23:19 backup-sd JobId 12068: Elapsed time=00:13:26, Transfer > >>>> rate=6.239 M Bytes/second > >>>> 03-Oct 23:19 backup-sd JobId 12068: Sending spooled attrs to the > >>>> Director. Despooling 53,043,158 bytes ... > >>>> 03-Oct 23:19 backup-dir JobId 12068: Bacula backup-dir 9.6.5 (11Jun20): > >>>> Build OS: x86_64-redhat-linux-gnu-bacula redhat (Core) > >>>> JobId: 12068 > >>>> Job: websrv.2021-10-03_23.05.00_12 > >>>> Backup Level: Full > >>>> Client: "websrv-fd" 9.0.6 (20Nov17) > >>>> x86_64-redhat-linux-gnu,redhat,Enterprise 8.1 > >>>> FileSet: "websrv" 2020-09-17 23:05:00 > >>>> Pool: "Default" (From Job resource) > >>>> Catalog: "MyCatalog" (From Client resource) > >>>> Storage: "backup-storage-data1" (From Pool resource) > >>>> Scheduled time: 03-Oct-2021 23:05:00 > >>>> Start time: 03-Oct-2021 23:06:04 > >>>> End time: 03-Oct-2021 23:19:37 > >>>> Elapsed time: 13 mins 33 secs > >>>> Priority: 10 > >>>> *FD Files Written: 200,038 SD Files Written: 200,038 FD Bytes Written: > >>>> 4,996,564,202 (4.996 GB) SD Bytes Written: 5,029,190,722 (5.029 GB)* > >>>> Rate: 6145.8 KB/s > >>>> Software Compression: 40.9% 1.7:1 > >>>> Comm Line Compression: 16.7% 1.2:1 > >>>> Snapshot/VSS: no > >>>> Encryption: no > >>>> Accurate: no > >>>> Volume name(s): volume-10|volume-11 > >>>> Volume Session Id: 4812 > >>>> Volume Session Time: 1619097903 > >>>> Last Volume Bytes: 49,996,799,959 (49.99 GB) > >>>> Non-fatal FD errors: 0 > >>>> SD Errors: 0 > >>>> FD termination status: OK > >>>> SD termination status: OK > >>>> Termination: Backup OK > >>>> > >>>> 03-Oct 23:19 backup-dir JobId 12068: Begin pruning Jobs older than 6 > >>>> months . > >>>> 03-Oct 23:19 backup-dir JobId 12068: Pruned 1 Job for client websrv-fd > >>>> from catalog. > >>>> 03-Oct 23:19 backup-dir JobId 12068: Begin pruning Files. > >>>> 03-Oct 23:19 backup-dir JobId 12068: Pruned Files from 1 Jobs for client > >>>> websrv-fd from catalog. > >>>> 03-Oct 23:19 backup-dir JobId 12068: End auto prune. > >>>> > >>>> > >>>> I also don't see any errors in messages or dmesg output. > >>>> > >>>> From what I gathered from an older thread: > >>>> https://marc.info/?l=bacula-devel&m=123237957307571&w=2 > >>>> <https://marc.info/?l=bacula-devel&m=123237957307571&w=2> , this points > >>>> that copying completed successfully. Also mentioned in the thread is > >>>> that "Maximium Block Size" could be the problem, but we haven't defined > >>>> that property. > >>>> > >>>> Does anyone know what could cause this? > >>>> > >>>> > >>>> BR > >>>> > >>>> -- > >>>> Marko Bartolić, Nimium d.o.o., email:<marko.barto...@nimium.hr> > >>>> > > > > > _______________________________________________ > Bacula-devel mailing list > Bacula-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-devel > _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel