Thanks all for your input & confirming it pretty much had to be a hardware 
problem.

In the interest of completeness / helping the next person who's googling for 
answers, reseating the SCSI card fixed it - it just completed a 900GB backup 
w/out any problems, onto one of the same tapes that it had rejected before 
after only a few GB.

Now that the major problem has been solved, I'm still curious about why Bacula 
ran into the (real!) hardware issue where tar did not. The tar tape was 
software compressed & then software encrypted, so the restore had to 
successfully decrypt & then decompress the data, so there couldn't have been 
any bit errors on that tar tape. This was true four months ago, with the 
sketchy cable, and this time, with the SCSI card that needed re-seated. Are 
fixed-size (tar) blocks just a little bit more robust than variable-sized 
(Bacula) blocks?

And thanks, Kern, for an outstanding product.

Dan Stieneke
IT Specialist
USDA - ARS - NWISRL
3793 N 3600 E
Kimberly, ID 83341
208/423-6519





 
-----Original Message-----
From: Kern Sibbald [mailto:k...@sibbald.com] 
Sent: Saturday, June 9, 2018 4:16 AM
To: Stieneke, Dan <dan.stien...@ars.usda.gov>
Cc: bacula-users@lists.sourceforge.net
Subject: Re: [Bacula-users] Bacula h/w write fails, but tar writes w/out error?

Hello,

Well, Bacula does not check what was written from time to time, but when it 
reaches the end of the tape, Bacula will re-read the last block written to make 
sure it corresponds to what it wrote, then it writes a double end of file.  In 
your case, something is going wrong -- either there is a hardware error, or 
there is really an end of tape marker that is telling Bacula that the tape is 
full.  From what you write, it looks more like a hardware error, and the kernel 
logs that you show below indicate that something serious is wrong with your 
tape drive.  While Bacula is writing you should never see such messages, and 
when they occur, Bacula will receive a write error.  Everything is consistent 
with a hardware problem.  You may get a better idea of what is going on by 
running the "btape test" command.  Please see the manual for instructions on 
how to run it.  I recommend both the test, and the fill commands.  Note: both 
of these commands will write on the tape.  Prior to using a tape with btape, if 
it has been labeled by Bacula, you should rewind the tape and write one or two 
eof marks at the beginning so that btape will take it as a blank tape.

If both btape "test" and "fill" work, you should not have problems with failing 
Bacula backups.  If either one of those tests fail, you must fix it prior to 
trying to backup on tape with Bacula.

Best regards,
Kern

On 06/08/2018 07:48 PM, Stieneke, Dan wrote:
> @ Dan Langille - yes, I think it is an issue with the tape drive, but only 
> Bacula runs into it; tar does not.
>
> @Martin Simmons - of course I should have checked/reported the log, sorry.
> =======BEGIN SYSLOG 
> ======================================================================
> ======== Jun  4 08:06:11 SRVName kernel: [410468.465702] st0: Sense 
> Key : Unit Attention [current] Jun  4 08:06:11 SRVName kernel: 
> [410468.465714] st0: Add. Sense: Power on, reset, or bus device reset 
> occurred Jun  4 08:10:47 SRVName kernel: [410744.629015] st0: Sense 
> Key : Unit Attention [current] Jun  4 08:10:47 SRVName kernel: 
> [410744.629026] st0: Add. Sense: Power on, reset, or bus device reset 
> occurred Jun  4 08:14:02 SRVName kernel: [410939.819168] st0: Sense 
> Key : Unit Attention [current] Jun  4 08:14:02 SRVName kernel: 
> [410939.819180] st0: Add. Sense: Power on, reset, or bus device reset 
> occurred Jun  4 08:16:57 SRVName kernel: [411114.538975] st0: Sense 
> Key : Unit Attention [current] Jun  4 08:16:57 SRVName kernel: 
> [411114.538988] st0: Add. Sense: Power on, reset, or bus device reset 
> occurred =======END SYSLOG 
> ======================================================================
> ========
>
> Googling for those entries I found 
> http://bacula.10910.n7.nabble.com/Bacula-tapes-marked-FULL-too-early-VolBytes-too-low-td58881i20.html.
>  Similar issue (but no report of tar), the thread ended with "similar problem 
> went away with replaced drive" & "get your drive tested"
>
>  From the Bacula log ("Error: Re-read of last block OK, but block numbers 
> differ. Read block=990557 Want block=990558.") it looks like Bacula checks up 
> on what has been written every so often. I don't think tar does that; it just 
> streams to tape. If my card/cable/tape is only slightly flaky, is it 
> reasonable to think that this extra work pushes it over the edge? Or am I 
> barking up the wrong tree?
>
> Thanks,
> Dan Stieneke
>
>
> ----- from Dan Langille -----
> If it is all tapes, is the issue with the tape drive?
>
> ----- from Martin Simmons -----
> Check the syslog and system console for error messages about the tape device 
> (since Bacula saw Input/output error, that usually means some error on the 
> device).
>
>
>
>
>
>>>>>> On Thu, 7 Jun 2018 15:38:13 +0000, Stieneke, Dan said:
>> The job ate through 4 tapes, with only 2 - 60GB on each tape. Then it hit 
>> recycle limits and was asking for more media.
>>
>> These are used tapes, but I can't see 4 consecutive tapes going bad at the 
>> same time.
>>
>> Incidentally, this is the same behavior I saw 4 months ago, and at that time 
>> I did test bacula to a brand-new tape, which also failed quickly.
>>
>> Thanks,
>> Dan
>>
>>
>> From: Josh Fisher [mailto:jfis...@pvct.com]
>> Sent: Wednesday, June 6, 2018 5:18 AM
>> To: Stieneke, Dan <dan.stien...@ars.usda.gov>; 
>> 'bacula-users@lists.sourceforge.net'
>> <bacula-users@lists.sourceforge.net>
>> Subject: Re: [Bacula-users] Bacula h/w write fails, but tar writes w/out 
>> error?
>>
>>
>> On 6/5/2018 3:45 PM, Stieneke, Dan wrote:
>> Ubuntu 16.04, Bacula 5.2.6, single-drive autoloader, all running Bacula 
>> trouble-free for years.
>>
>> Four months ago I got some errors in Bacula that looked like h/w errors, 
>> although jobs using tar on the same drive ran without error. I had 
>> suspicions about a cable, and when I replaced it everything returned to 
>> normal, until now, when I'm getting the same kinds of errors.
>>
>> Tar works on the same drive, but what about on the same tape? How do you 
>> know you are not seeing bad tapes?
>>
>>
>>
>> The relevant part of "messages" is:
>> = = = = = = = = = = = = = = = = = =
>> 05-Jun 09:17 xxx-sd JobId 794: Error: block.c:577 Write error at 12:60511 on 
>> device "Ultrium-TD4" (/dev/tape/by-id/scsi-1IBM_ULTRIUM-TD4_1310010391-nst). 
>> ERR=Input/output error.
>> 05-Jun 09:18 xxx-sd JobId 794: Error: Re-read of last block OK, but block 
>> numbers differ. Read block=990557 Want block=990558.
>> 05-Jun 09:18 xxx-sd JobId 794: End of medium on Volume "A00030L4" 
>> Bytes=63,902,942,208 Blocks=990,558 at 05-Jun-2018 09:18.
>> 05-Jun 09:18 xxx-sd JobId 794: 3307 Issuing autochanger "unload slot 16, 
>> drive 0" command.
>> = = = = = = = = = = = = = = = = = =
>>
>> As you can see, it had an error after about 64GB (of an 800GB native / 
>> 1600GB compressed tape).
>>
>> I've cleaned the drive. And again, backups made with tar record without 
>> error and restore without error.
>> Any ideas?
>>
>> Thanks,
>> Dan Stieneke
>> IT Specialist
>> USDA - ARS - NWISRL
>> 3793 N 3600 E
>> Kimberly, ID 83341
>>
>>
>>
>>
>> This electronic message contains information generated by the USDA solely 
>> for the intended recipients. Any unauthorized interception of this message 
>> or the use or disclosure of the information it contains may violate the law 
>> and subject the violator to civil or criminal penalties. If you believe you 
>> have received this message in error, please notify the sender and delete the 
>> email immediately.
>>
>>
>>
>> ---------------------------------------------------------------------
>> -
>> --------
>>
>> Check out the vibrant tech community on one of the world's most
>>
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>>
>>
>>
>> _______________________________________________
>>
>> Bacula-users mailing list
>>
>> Bacula-users@lists.sourceforge.net<mailto:Bacula-users@lists.sourcefo
>> r
>> ge.net>
>>
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>>
> ----------------------------------------------------------------------
> -------- Check out the vibrant tech community on one of the world's 
> most engaging tech sites, Slashdot.org! http://sdm.link/slashdot 
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to