Re: OpenSolaris & amtapetype

Dustin J. Mitchell Sun, 14 Mar 2010 12:19:26 -0700

On Sun, Mar 14, 2010 at 6:42 AM, Filip Van Raemdonck
<[email protected]> wrote:
> Brief summary: from what I can tell, amtapetype loses the "block_size"
> property on the device during/after it's first complete medium fill.
> And probably more info about, or the whole device "pointer", causing
> it to fail.
>
> Long story below:
>
> I built 2.6.1p2 -- using SUNWgcc -- without problems.
> Before pointing to SUNWgcc as the culprit, please note that I also
> installed a 2.6.1p1 package from OpenSolaris Source Juicer
> (http://jucr.opensolaris.org/home/) which was build using Sun Studio
> Express (cfr. http://jucr.opensolaris.org/build/viewlog/4129/), but
> exhibits the same problem.
>
> When I try running amtapetype on a DDS3 medium, it does a compression
> test, then writes 1 file to fill the whole tape, and when it's
> finished doing that, it bails out with a message:
>        Wrote less than 100MB to the device: Error writing block: Error 0
>
> I can tell it wrote out the whole tape because I ran amtapetype inside
> script(1) and it has written about 375000 blocks, which at 32k default
> amtapetype block is roughly 12GB or the size of a DDS3.
> I tried a few other DDS3 media, with the same results; DDS2 media too
> behave the same.
> I'm running amtapetype on an actual DDS4 medium as I write this, but
> aside from a longer run time I don't expect different results.
>
> I also tried different blocksizes but without any better results, either.
>
> The error message above that I received looks really similar too the
> problem in 
> http://www.mail-archive.com/[email protected]/msg222744.html
> but there never was any reply to that email.


That's because it was on the wrong list! I've copied Trond Endrestøl
here.  We need more FreeBSD people using and hacking Amanda.  As a
side-note to John Hein: I didn't realize that 2.6.1p1 was in ports --
how does that work??

> I turned the fatal error inside the make_tapetype function into a
> warning and added following warning inside write_one_file after it's
> done writing to find out what it has done:
>        warn "(pattern, blocks_written, block_size) are ($pattern,
> $blocks_written, $block_size)";
> This is how I found out that amtapetype doesn't know the device's
> blocksize anymore after the first complete tape fill; here's the
> output of that invocation:
>
>        mecha...@kari:~$ pfexec /opt/amanda/sbin/amtapetype -b 2048k -t
> sony-dgd125p /dev/rmt/0l
>        Applying heuristic check for compression.
>        (pattern, blocks_written, block_size) are (RANDOM, 92, 2097152) at
> /opt/amanda/sbin/amtapetype line 184.
>        (pattern, blocks_written, block_size) are (FIXED, 92, 2097152) at
> /opt/amanda/sbin/amtapetype line 184.
>        Wrote random (uncompressible) data at 3062507.68253968 bytes/sec
>        Wrote fixed (compressible) data at 3062507.68253968 bytes/sec
>        Compression: disabled
>        Writing one file to fill the volume.
>        (pattern, blocks_written, block_size) are (RANDOM, 5869, ) at
> /opt/amanda/sbin/amtapetype line 184.
>        Wrote 0 which is less than 100MB to the device: Error writing block: 
> Error 0
>        Wrote 0 bytes at 0 kb/sec
>        Writing smaller files (0 bytes) to determine filemark.
>        Error writing label 'amtapetype-1990590034': Error writing block:
> Error 0 at /opt/amanda/sbin/amtapetype line 84.
>        mecha...@kari:~$

Since it's showing a blank value, I think that the property_get call
is returning undef, which will happen if the device is in an error
state, which (at that point) it is.  So I don't think there's any
forgetting of block size going on.

Now, *why* the device is in an error state is the question to which I
don't know the answer, but I can give you some pointers.  There was an
error-handling bug in 2.6.1 which has since been fixed that caused the
"Error 0".  The actual error should be "Mysterious short write on tape
device: Tried ??, got ??".  If you look in the amtapetype debug log
file, you should see that message, including actual sizes.  This error
is "not supposed to happen", in the sense that tape drives are
supposed to either write the entire block, or not write anything at
all.

Another funny thing about tape drives is that if you write several
whole blocks at once, they automatically split the write up into
individual blocks.  So I suspect that this tape drive is not actually
configured for 2M blocks, and the kernel is automatically remapping
Amanda's 2M writes into a bunch of smaller block-sized writes.  At the
end of the tape, the kernel manages to write some, but not all, of
these blocks, and returns a write size that is a multiple of the
actual block size, but is smaller than the 2M that Amanda tried to
write.

Hopefully that helps you to chase this one down.

It would be interesting to add a component to amtapetype that can
"detect" such misconfigurations. I think you could so so by writing a
short file with the given blocksize, then trying to re-read that file
with half the blocksize.  Keep going, until the read fails.  The last
functioning blocksize is the size at which the tape drive is
operating.  Since you've already been tinkering with amtapetype, do
you want to give that a shot?

Dustin

P.S. I hope the fact that this script is in Perl was helpful to you,
since you were able to make changes while debugging it.  That is one
of the aims of the Perl rewrite!

-- 
Open Source Storage Engineer
http://www.zmanda.com

Re: OpenSolaris & amtapetype

Reply via email to