Hi !

One of the most illustrative and useful mail threads for those Who use
tape/tapedrivers/autochangers in our backup schemas.

Sticky mail.

Thanks to all.

El 19 ene. 2017 4:27 a. m., "Gi Dot" <gadi...@gmail.com> escribió:

> Kern, Alan,
>
> Thanks for the advice. A bit over the top for me to digest, but I'll work
> on it.
>
> On Mon, Jan 9, 2017 at 11:04 PM, Alan Brown <a...@mssl.ucl.ac.uk> wrote:
>
>> On 09/01/17 13:45, Kern Sibbald wrote:
>>
>> Hello,
>>
>> The status Bacula received was -1, which means that the tape drive
>> reported a hardware end of tape (i.e. an end of tape marker was seen.  This
>> can happen for the following reasons:
>>
>> 1. You reached the hardware end of tape marker at 150GB, but the marker
>> was placed in the wrong place on the tape when it was manufactured.  I.e.
>> the tape cassette is defective.
>>
>>
>> Kern, that's not a good interpretation of the problem.
>>
>>
>> LTO tapes don't have a "hardware end of tape marker" as you might expect
>> with DAT or other older unidirectional tapes.
>>
>>
>> Because of the serpentine layout of the tape, the beginning of the tape
>> is also the end of the tape and the servo track (factory written and
>> unchangeable) contains "offset distance from end of the reel" information.
>>
>> Serpentine means:
>>
>> 1: The tape winds to the end of the reel, heads move slightly (onto the
>> next track) and then the tape winds back into the cartridge.
>> 2: The heads move to the next track again.
>> 3: This process is then repeated until the last track pair is completed.
>> 4: Data is written to the tape in both directional passes.
>>
>> When the end of the last track is reached, the tape has been wound back
>> into the cartridge.
>>
>> What this means is that the maximum seek time is approximately half of
>> one track length (~900 metres) and that's around 35GB, even if you're
>> seeking several hundred GB into the tape - ie: Whilst the seek command is a
>> linear offset, actual seeking on a LTO is 2-dimensional - "N track and X
>> offset". The tape's internal chip records the 2D location of files and data
>> blocks, so that there's never any need to linearly seek along all tracks
>> from the start of the tape.
>>
>> LTO heads are constructed so that drives do read-after-write verification
>> on the fly in both directions. A bacula verification pass is normally
>> unncecessary because detected errors result in the data being rewritten to
>> the tape immediately.
>>
>> If there are errors, the drive will attempt to rewrite the data several
>> times.(*) If all rewrites fail then it will flag an uncorrectable error -
>> "The tape is bad and should be discarded"(**). Bacula interprets this as an
>> end-of-tape error
>>
>>
>> (*) This means that errors on a tape result in 2 effects if there are a
>> lot of errors
>>
>>    1: There's a massive slowdown in reported despooling speed for jobs
>> and tape "full" capacity is reduced somewhat from the theoretical values
>> (somewhere between 90%-250% of _uncompressed_ capacity would be a normal
>> tape)
>>
>>    2: When reading the tape's RFID chip, it will say that they tape is
>> somewhere between "97"-"99"% full, but the total amount of data it says has
>> been written since last labelled is significantly less than the
>> _uncompressed_ value of the tape.
>>
>> (**) The same effect will occur if the heads are dirty or damaged - and
>> it DOES happen(***). Once a contaminated tape finds its way into a drive
>> and fouls the heads you can pretty much guarantee that all subsequent tapes
>> will have reported problems, but until the heads are cleaned or repaired
>> you won't know if the tapes are wrecked or OK.
>>
>> (***) We had a bad batch of HP LTO5s contaminate multiple drives before
>> we realised what was happening. We're still cleaning up the mess 3 years
>> later.
>>
>>
>> Drive error codes actually indicate "drive problem", "tape problem" or
>> "unable to work out which is the problem", but the effect is the same as
>> far as bacula's concerned. There are a slew of other error codes.
>>
>>
>>
>> LTO tapes wear out rapidly with repeated use. The lifespan of a LTO tape
>> is claimed to be "up to" 162 complete writes but in reality it's more like
>> 10-20% of this number before degradation is significant. We're seeing tapes
>> with 20-30 write cycles down to 60% of original capacity and thanks to
>> rewrites the despool speeds are _very slow_.
>>
>>
>> Apart from interrogating the tape drive and tape cartridge chip (Kern and
>> I have been discussing how to handle this on the fly), Despooling speed is
>> a critical indicator of tape health. If it suddenly drops off, this is
>> cause for alarm.
>>
>>
>>
>>
>> 2. You are using some tape driver (e.g. the ibm tape driver) rather than
>> the Linux st tape driver.  The ibm tape driver does not work correctly with
>> Bacula.
>>
>>
>> Having encountered this problem, the described issue is not consistent
>> with the IBM driver error (which comes form "ERROR 0: Success" messages).
>>
>> In the case of a IBM driver, the tape can be labelled and written quite
>> happily. Problems occur when attempts are made to seek to EOD on a tape
>> with _existing_ data - the error 0 message fools bacula into thinking the
>> operatiopn has failed.
>>
>>
>> My opinion:
>>
>> The error reported and the fact that it took 31 minutes to write 150Gb
>> before erroring out points to fouled heads.
>> Load a cleaning tape(****) and try writing a new tape.
>> If that writes ok, then discard the errored tape (and possibly the one
>> before that). If not then the drive will need return-to-base repairs and
>> the test tape/last tape and one before that should be discarded.
>>
>> (****) NEVER share a cleaning tape between drives. Yes I know that's what
>> libraries do with dedicated cleaning tape slots, but it's a really fast way
>> of cross-contaminating hardware. Don't do it.
>>
>>
>> If you don't have a LTO tape cartridge reader (www.mptapes.com), then
>> the next best thing is to ensure you have the latest version of sg3_tools
>> installed, and use sg_read_attr to interrogate the chip.
>>
>>
>> You should also install the IBM or HP drive management tools (even if
>> this means installing windows) and interrogate drive health.
>>
>>
>> tapeinfo and loaderinfo utilities are useful but incomplete for this kind
>> of diagnosis.
>>
>>
>> I've been working through the various sg attribute pages trying to see
>> which ones are useful. Drives actually log a _large_ amount of data
>> internally about the last few hundred tapes used, but unless you ask the
>> right questions you won't get any answers out of them (HP and IBM drive
>> tools ask those questions, of course - and know how to interpret the
>> answers)
>>
>>
>>
>>
>>
>>
>> Best regards,
>> Kern
>>
>> On 01/09/2017 04:29 AM, Gi Dot wrote:
>>
>> Hi all,
>>
>>
>> At the data centre we are using IBM-LTO tape - 3.0TB compressed, 1.5T
>> uncompressed. Last 2 nights a backup was running and it stopped at about
>> 150GB size and bacula marked the tape as full.
>>
>> Since the total amount of backed up data sometimes could be huge, I have
>> purged the volume straight away before the tape was inserted. There is a
>> total of 10 jobs, and the first job holds the biggest data, somewhere
>> around 500GB to 2TB at a time. Backup failed at the first job, at 150GB
>> size.
>>
>> | 3,053 | db01Job          | 2017-01-08 01:00:03 | B    | F     |   43,942 | 
>>   150,874,925,633 | f
>>
>>
>> Excerpt from the logs:
>>
>> 07-Jan 05:00 phisbackupdns1-dir JobId 3052: shell command: run AfterJob 
>> "/usr/lib64/bacula/delete_catalog_backup"
>> 08-Jan 01:00 phisbackupdns1-dir JobId 3053: Start Backup JobId 3053, 
>> Job=phisdb01Job.2017-01-08_01.00.00_52
>> 08-Jan 01:00 phisbackupdns1-dir JobId 3053: Using Device "Drive0"
>> 08-Jan 01:00 phisbackupdns1-sd JobId 3053: Volume "A00053L5" previously 
>> written, moving to end of data.
>> 08-Jan 01:01 phisbackupdns1-sd JobId 3053: Warning: For Volume "A00053L5":
>> The number of files mismatch! Volume=1955 Catalog=0
>> Correcting Catalog
>> 08-Jan 01:31 phisbackupdns1-sd JobId 3053: End of Volume "A00053L5" at 
>> 2106:1 on device "Drive0" (/dev/nst1). Write of 64512
>> bytes got -1.
>> 08-Jan 01:31 phisbackupdns1-sd JobId 3053: Re-read of last block succeeded.
>> 08-Jan 01:31 phisbackupdns1-sd JobId 3053: End of medium on Volume 
>> "A00053L5" Bytes=150,990,400,512 Blocks=2,340,501 at 08-Ja
>> n-2017 01:31.
>> 08-Jan 01:31 phisbackupdns1-sd JobId 3053: 3307 Issuing autochanger "unload 
>> slot 2, drive 0" command.
>> 08-Jan 01:33 phisbackupdns1-sd JobId 3053: No slot defined in catalog 
>> (slot=0) for Volume "A00032L5" on "Drive0" (/dev/nst1).
>> 08-Jan 01:33 phisbackupdns1-sd JobId 3053: Cartridge change or "update 
>> slots" may be required.
>> 08-Jan 01:33 phisbackupdns1-sd JobId 3053: Warning: mount.c:217 Open device 
>> "Drive0" (/dev/nst1) Volume "A00032L5" failed: ER
>> R=dev.c:513 Unable to open device "Drive0" (/dev/nst1): ERR=No medium found
>>
>>
>>
>> Hardware compression is enabled:
>> # tapeinfo -f /dev/nst1
>> Product Type: Tape Drive
>> Vendor ID: 'IBM     '
>> Product ID: 'ULT3580-TD5     '
>> Revision: 'G360'
>> Attached Changer API: No
>> SerialNumber: '10WT008032'
>> MinBlock: 1
>> MaxBlock: 8388608
>> SCSI ID: 1
>> SCSI LUN: 0
>> Ready: yes
>> BufferedMode: yes
>> Medium Type: 0x58
>> Density Code: 0x58
>> BlockSize: 0
>> DataCompEnabled: yes
>> DataCompCapable: yes
>> DataDeCompEnabled: yes
>> CompType: 0x1
>> DeCompType: 0x1
>> BOP: yes
>> Block Position: 0
>> Partition 0 Remaining Kbytes: -1
>> Partition 0 Size in Kbytes: -1
>> ActivePartition: 0
>> EarlyWarningSize: 0
>> NumPartitions: 0
>> MaxPartitions: 1
>>
>>
>> Pool configuration for the volume:
>> Pool {
>>   Name = ADHOC
>>   Label Format = "ADHOC_Vol"
>>   Pool Type = Backup
>>   Recycle = yes
>>   AutoPrune = yes
>>   Storage = ibmts3310
>>   Volume Retention = 12h
>>   Recycle Current Volume = Yes
>> }
>>
>>
>> Side note: I just realized that I missed the "Volume Use Duration = 10h" 
>> directive in the pool. Reason being is the same tape would be in the drive 
>> for 2 nights (Saturday and Sunday), since there is no operator around to 
>> change a tape. The tape supposed to be recycled on Sunday night.
>>
>>
>>
>> Appreciate if anyone can enlighten me as to why the tape is full way earlier 
>> compared to the size that it is able to contain.
>>
>>
>> Thanks.
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>
>>
>>
>> _______________________________________________
>> Bacula-users mailing 
>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>
>>
>>
>> _______________________________________________
>> Bacula-users mailing 
>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to