Thanks all for your input & confirming it pretty much had to be a hardware problem.
In the interest of completeness / helping the next person who's googling for answers, reseating the SCSI card fixed it - it just completed a 900GB backup w/out any problems, onto one of the same tapes that it had rejected before after only a few GB. Now that the major problem has been solved, I'm still curious about why Bacula ran into the (real!) hardware issue where tar did not. The tar tape was software compressed & then software encrypted, so the restore had to successfully decrypt & then decompress the data, so there couldn't have been any bit errors on that tar tape. This was true four months ago, with the sketchy cable, and this time, with the SCSI card that needed re-seated. Are fixed-size (tar) blocks just a little bit more robust than variable-sized (Bacula) blocks? And thanks, Kern, for an outstanding product. Dan Stieneke IT Specialist USDA - ARS - NWISRL 3793 N 3600 E Kimberly, ID 83341 208/423-6519 -----Original Message----- From: Kern Sibbald [mailto:k...@sibbald.com] Sent: Saturday, June 9, 2018 4:16 AM To: Stieneke, Dan <dan.stien...@ars.usda.gov> Cc: bacula-users@lists.sourceforge.net Subject: Re: [Bacula-users] Bacula h/w write fails, but tar writes w/out error? Hello, Well, Bacula does not check what was written from time to time, but when it reaches the end of the tape, Bacula will re-read the last block written to make sure it corresponds to what it wrote, then it writes a double end of file. In your case, something is going wrong -- either there is a hardware error, or there is really an end of tape marker that is telling Bacula that the tape is full. From what you write, it looks more like a hardware error, and the kernel logs that you show below indicate that something serious is wrong with your tape drive. While Bacula is writing you should never see such messages, and when they occur, Bacula will receive a write error. Everything is consistent with a hardware problem. You may get a better idea of what is going on by running the "btape test" command. Please see the manual for instructions on how to run it. I recommend both the test, and the fill commands. Note: both of these commands will write on the tape. Prior to using a tape with btape, if it has been labeled by Bacula, you should rewind the tape and write one or two eof marks at the beginning so that btape will take it as a blank tape. If both btape "test" and "fill" work, you should not have problems with failing Bacula backups. If either one of those tests fail, you must fix it prior to trying to backup on tape with Bacula. Best regards, Kern On 06/08/2018 07:48 PM, Stieneke, Dan wrote: > @ Dan Langille - yes, I think it is an issue with the tape drive, but only > Bacula runs into it; tar does not. > > @Martin Simmons - of course I should have checked/reported the log, sorry. > =======BEGIN SYSLOG > ====================================================================== > ======== Jun 4 08:06:11 SRVName kernel: [410468.465702] st0: Sense > Key : Unit Attention [current] Jun 4 08:06:11 SRVName kernel: > [410468.465714] st0: Add. Sense: Power on, reset, or bus device reset > occurred Jun 4 08:10:47 SRVName kernel: [410744.629015] st0: Sense > Key : Unit Attention [current] Jun 4 08:10:47 SRVName kernel: > [410744.629026] st0: Add. Sense: Power on, reset, or bus device reset > occurred Jun 4 08:14:02 SRVName kernel: [410939.819168] st0: Sense > Key : Unit Attention [current] Jun 4 08:14:02 SRVName kernel: > [410939.819180] st0: Add. Sense: Power on, reset, or bus device reset > occurred Jun 4 08:16:57 SRVName kernel: [411114.538975] st0: Sense > Key : Unit Attention [current] Jun 4 08:16:57 SRVName kernel: > [411114.538988] st0: Add. Sense: Power on, reset, or bus device reset > occurred =======END SYSLOG > ====================================================================== > ======== > > Googling for those entries I found > http://bacula.10910.n7.nabble.com/Bacula-tapes-marked-FULL-too-early-VolBytes-too-low-td58881i20.html. > Similar issue (but no report of tar), the thread ended with "similar problem > went away with replaced drive" & "get your drive tested" > > From the Bacula log ("Error: Re-read of last block OK, but block numbers > differ. Read block=990557 Want block=990558.") it looks like Bacula checks up > on what has been written every so often. I don't think tar does that; it just > streams to tape. If my card/cable/tape is only slightly flaky, is it > reasonable to think that this extra work pushes it over the edge? Or am I > barking up the wrong tree? > > Thanks, > Dan Stieneke > > > ----- from Dan Langille ----- > If it is all tapes, is the issue with the tape drive? > > ----- from Martin Simmons ----- > Check the syslog and system console for error messages about the tape device > (since Bacula saw Input/output error, that usually means some error on the > device). > > > > > >>>>>> On Thu, 7 Jun 2018 15:38:13 +0000, Stieneke, Dan said: >> The job ate through 4 tapes, with only 2 - 60GB on each tape. Then it hit >> recycle limits and was asking for more media. >> >> These are used tapes, but I can't see 4 consecutive tapes going bad at the >> same time. >> >> Incidentally, this is the same behavior I saw 4 months ago, and at that time >> I did test bacula to a brand-new tape, which also failed quickly. >> >> Thanks, >> Dan >> >> >> From: Josh Fisher [mailto:jfis...@pvct.com] >> Sent: Wednesday, June 6, 2018 5:18 AM >> To: Stieneke, Dan <dan.stien...@ars.usda.gov>; >> 'bacula-users@lists.sourceforge.net' >> <bacula-users@lists.sourceforge.net> >> Subject: Re: [Bacula-users] Bacula h/w write fails, but tar writes w/out >> error? >> >> >> On 6/5/2018 3:45 PM, Stieneke, Dan wrote: >> Ubuntu 16.04, Bacula 5.2.6, single-drive autoloader, all running Bacula >> trouble-free for years. >> >> Four months ago I got some errors in Bacula that looked like h/w errors, >> although jobs using tar on the same drive ran without error. I had >> suspicions about a cable, and when I replaced it everything returned to >> normal, until now, when I'm getting the same kinds of errors. >> >> Tar works on the same drive, but what about on the same tape? How do you >> know you are not seeing bad tapes? >> >> >> >> The relevant part of "messages" is: >> = = = = = = = = = = = = = = = = = = >> 05-Jun 09:17 xxx-sd JobId 794: Error: block.c:577 Write error at 12:60511 on >> device "Ultrium-TD4" (/dev/tape/by-id/scsi-1IBM_ULTRIUM-TD4_1310010391-nst). >> ERR=Input/output error. >> 05-Jun 09:18 xxx-sd JobId 794: Error: Re-read of last block OK, but block >> numbers differ. Read block=990557 Want block=990558. >> 05-Jun 09:18 xxx-sd JobId 794: End of medium on Volume "A00030L4" >> Bytes=63,902,942,208 Blocks=990,558 at 05-Jun-2018 09:18. >> 05-Jun 09:18 xxx-sd JobId 794: 3307 Issuing autochanger "unload slot 16, >> drive 0" command. >> = = = = = = = = = = = = = = = = = = >> >> As you can see, it had an error after about 64GB (of an 800GB native / >> 1600GB compressed tape). >> >> I've cleaned the drive. And again, backups made with tar record without >> error and restore without error. >> Any ideas? >> >> Thanks, >> Dan Stieneke >> IT Specialist >> USDA - ARS - NWISRL >> 3793 N 3600 E >> Kimberly, ID 83341 >> >> >> >> >> This electronic message contains information generated by the USDA solely >> for the intended recipients. Any unauthorized interception of this message >> or the use or disclosure of the information it contains may violate the law >> and subject the violator to civil or criminal penalties. If you believe you >> have received this message in error, please notify the sender and delete the >> email immediately. >> >> >> >> --------------------------------------------------------------------- >> - >> -------- >> >> Check out the vibrant tech community on one of the world's most >> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> >> >> >> >> _______________________________________________ >> >> Bacula-users mailing list >> >> Bacula-users@lists.sourceforge.net<mailto:Bacula-users@lists.sourcefo >> r >> ge.net> >> >> https://lists.sourceforge.net/lists/listinfo/bacula-users >> >> > ---------------------------------------------------------------------- > -------- Check out the vibrant tech community on one of the world's > most engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users