>>>>> On Tue, 31 Mar 2020 12:40:04 +0200, Pierre Bernhardt said:
> 
> Am 30.03.20 um 16:12 schrieb Martin Simmons:
> Hello,
> 
> > You could try temporarily hacking bacula-sd to report the short block as an
> > info message.  In src/stored/block.c, change M_ERROR to M_INFO in these 
> > lines:
> > 
> >       Mmsg4(dev->errmsg, _("[SE0208] Volume data error at %u:%u! Short 
> > block of %d bytes on device %s discarded.\n"),
> >          dev->file, dev->block_num, block->read_len, dev->print_name());
> >       Jmsg(jcr, M_ERROR, 0, "%s", dev->errmsg);
> It looks like it is not enought ot change M_ERROR to M_INFO because the job 
> will still
> fail and so the metadata still wont migrate to the new job:
> 
> 31-Mar 01:48 backup-dir JobId 47771: The following 1 JobId was chosen to be 
> migrated: 47704
> 31-Mar 01:48 backup-dir JobId 47771: Migration using JobId=47704 
> Job=nihilnihil_home.2020-03-21_20.23.31_49
> 31-Mar 01:48 backup-dir JobId 47771: Start Migration JobId 47771, 
> Job=MigrateFile2Drive.2020-03-31_01.48.24_05
> 31-Mar 01:49 backup-dir JobId 47771: Using Device "DiskStorage2" to read.
> 31-Mar 01:53 backup-sd JobId 47771: Ready to read from volume "DISK016" on 
> File device "DiskStorage2" (/media/baculadisk2).
> 31-Mar 01:53 backup-sd JobId 47771: Forward spacing Volume "DISK016" to 
> addr=217
> 31-Mar 07:00 backup-sd JobId 47771: Error: block.c:682 [SE0208] Volume data 
> error at 0:0! Short block of 57010 bytes on device "DiskStorage2" 
> (/media/baculadisk2) discarded.
> 31-Mar 07:00 backup-sd JobId 47771: Error: read_records.c:160 block.c:682 
> [SE0208] Volume data error at 0:0! Short block of 57010 bytes on device 
> "DiskStorage2" (/media/baculadisk2) discarded.
> 31-Mar 07:00 backup-sd JobId 47771: End of Volume "DISK016" at 
> addr=972406571008 on device "DiskStorage2" (/media/baculadisk2).
> 31-Mar 07:01 backup-sd JobId 47771: Ready to read from volume "DISK017" on 
> File device "DiskStorage2" (/media/baculadisk2).
> 31-Mar 07:01 backup-sd JobId 47771: Forward spacing Volume "DISK017" to 
> addr=213
> 31-Mar 08:00 backup-sd JobId 47771: End of Volume "DISK017" at 
> addr=110838477984 on device "DiskStorage2" (/media/baculadisk2).
> 31-Mar 08:00 backup-sd JobId 47771: Elapsed time=06:07:58, Transfer 
> rate=49.02 M Bytes/second
> 31-Mar 10:01 backup-dir JobId 47771: Warning: Found errors during the 
> migration process. The original job 47704 will be kept in the catalog and the 
> Migration job will be marked in Error
> 31-Mar 10:01 backup-dir JobId 47771: Error: bsock.c:388 Wrote 4 bytes to 
> Storage daemon:backup.localnet.cosmicstars.de:9103, but only 0 accepted.
> 31-Mar 10:01 backup-dir JobId 47771: Error: Bacula backup-dir 9.4.2 (04Feb19):
>   Build OS:               x86_64-pc-linux-gnu debian buster/sid
>   Prev Backup JobId:      47704
>   Prev Backup Job:        nihilnihil_home.2020-03-21_20.23.31_49
>   New Backup JobId:       47772
>   Current JobId:          47771
>   Current Job:            MigrateFile2Drive.2020-03-31_01.48.24_05
>   Backup Level:           Full
>   Client:                 backup-fd
>   FileSet:                "Full Set" 2017-10-09 08:53:50
>   Read Pool:              "Migrate" (From Job resource)
>   Read Storage:           "Disk2" (From Pool resource)
>   Write Pool:             "Monthly" (From Job Pool's NextPool resource)
>   Write Storage:          "FibreCAT TX48 S2" (From Job Pool's NextPool 
> resource)
>   Catalog:                "MyCatalog" (From Client resource)
>   Start time:             31-Mar-2020 01:49:20
>   End time:               31-Mar-2020 10:01:43
>   Elapsed time:           8 hours 12 mins 23 secs
>   Priority:               21
>   SD Files Written:       1,030,385
>   SD Bytes Written:       1,082,331,572,757 (1.082 TB)
>   Rate:                   36635.8 KB/s
>   Volume name(s):         LTO40025|LTO40026
>   Volume Session Id:      1
>   Volume Session Time:    1585612030
>   Last Volume Bytes:      270,297,861,120 (270.2 GB)
>   SD Errors:              2
>   SD termination status:  OK
>   Termination:            *** Migration Error ***
> 
> 
> 
> I think the return line should be corrected by changing the false? I'm virgin 
> coding
> c++ ;-)
> 
>    if (block->block_len > block->read_len) {
>       dev->dev_errno = EIO;
>       Mmsg4(dev->errmsg, _("[SE0208] Volume has data error at %u:%u! Short 
> block of %d bytes on device %s discarded.\n"),
>          dev->file, dev->block_num, block->read_len, dev->print_name());
>       Jmsg(jcr, M_INFO, 0, "%s", dev->errmsg);
>       dev->set_short_block();
>       block->read_len = block->binbuf = 0;
>       return true;             /* return error */
>    }

No, returning true will not work correctly -- the calling function must get
false for a short block.

Your change to use M_INFO looks correct, but the block.c:682 message in the
log still says "Error:" so you are still running the original code.  Did you
run "make" and "make install" after changing the code?  Did they complete
without errors (you might need to run "make install" as root)?  Did you
restart the bacula-sd after that?

I also notice that there is another use of M_ERROR at line 160 of
read_records.c that causes a second error message:

               Jmsg1(jcr, M_ERROR, 0, "%s", dev->errmsg);

This also needs to be changed to M_INFO.

__Martin


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to