On Sun, Mar 14, 2010 at 6:42 AM, Filip Van Raemdonck <[email protected]> wrote: > Brief summary: from what I can tell, amtapetype loses the "block_size" > property on the device during/after it's first complete medium fill. > And probably more info about, or the whole device "pointer", causing > it to fail. > > Long story below: > > I built 2.6.1p2 -- using SUNWgcc -- without problems. > Before pointing to SUNWgcc as the culprit, please note that I also > installed a 2.6.1p1 package from OpenSolaris Source Juicer > (http://jucr.opensolaris.org/home/) which was build using Sun Studio > Express (cfr. http://jucr.opensolaris.org/build/viewlog/4129/), but > exhibits the same problem. > > When I try running amtapetype on a DDS3 medium, it does a compression > test, then writes 1 file to fill the whole tape, and when it's > finished doing that, it bails out with a message: > Wrote less than 100MB to the device: Error writing block: Error 0 > > I can tell it wrote out the whole tape because I ran amtapetype inside > script(1) and it has written about 375000 blocks, which at 32k default > amtapetype block is roughly 12GB or the size of a DDS3. > I tried a few other DDS3 media, with the same results; DDS2 media too > behave the same. > I'm running amtapetype on an actual DDS4 medium as I write this, but > aside from a longer run time I don't expect different results. > > I also tried different blocksizes but without any better results, either. > > The error message above that I received looks really similar too the > problem in > http://www.mail-archive.com/[email protected]/msg222744.html > but there never was any reply to that email.
That's because it was on the wrong list! I've copied Trond Endrestøl here. We need more FreeBSD people using and hacking Amanda. As a side-note to John Hein: I didn't realize that 2.6.1p1 was in ports -- how does that work?? > I turned the fatal error inside the make_tapetype function into a > warning and added following warning inside write_one_file after it's > done writing to find out what it has done: > warn "(pattern, blocks_written, block_size) are ($pattern, > $blocks_written, $block_size)"; > This is how I found out that amtapetype doesn't know the device's > blocksize anymore after the first complete tape fill; here's the > output of that invocation: > > mecha...@kari:~$ pfexec /opt/amanda/sbin/amtapetype -b 2048k -t > sony-dgd125p /dev/rmt/0l > Applying heuristic check for compression. > (pattern, blocks_written, block_size) are (RANDOM, 92, 2097152) at > /opt/amanda/sbin/amtapetype line 184. > (pattern, blocks_written, block_size) are (FIXED, 92, 2097152) at > /opt/amanda/sbin/amtapetype line 184. > Wrote random (uncompressible) data at 3062507.68253968 bytes/sec > Wrote fixed (compressible) data at 3062507.68253968 bytes/sec > Compression: disabled > Writing one file to fill the volume. > (pattern, blocks_written, block_size) are (RANDOM, 5869, ) at > /opt/amanda/sbin/amtapetype line 184. > Wrote 0 which is less than 100MB to the device: Error writing block: > Error 0 > Wrote 0 bytes at 0 kb/sec > Writing smaller files (0 bytes) to determine filemark. > Error writing label 'amtapetype-1990590034': Error writing block: > Error 0 at /opt/amanda/sbin/amtapetype line 84. > mecha...@kari:~$ Since it's showing a blank value, I think that the property_get call is returning undef, which will happen if the device is in an error state, which (at that point) it is. So I don't think there's any forgetting of block size going on. Now, *why* the device is in an error state is the question to which I don't know the answer, but I can give you some pointers. There was an error-handling bug in 2.6.1 which has since been fixed that caused the "Error 0". The actual error should be "Mysterious short write on tape device: Tried ??, got ??". If you look in the amtapetype debug log file, you should see that message, including actual sizes. This error is "not supposed to happen", in the sense that tape drives are supposed to either write the entire block, or not write anything at all. Another funny thing about tape drives is that if you write several whole blocks at once, they automatically split the write up into individual blocks. So I suspect that this tape drive is not actually configured for 2M blocks, and the kernel is automatically remapping Amanda's 2M writes into a bunch of smaller block-sized writes. At the end of the tape, the kernel manages to write some, but not all, of these blocks, and returns a write size that is a multiple of the actual block size, but is smaller than the 2M that Amanda tried to write. Hopefully that helps you to chase this one down. It would be interesting to add a component to amtapetype that can "detect" such misconfigurations. I think you could so so by writing a short file with the given blocksize, then trying to re-read that file with half the blocksize. Keep going, until the read fails. The last functioning blocksize is the size at which the tape drive is operating. Since you've already been tinkering with amtapetype, do you want to give that a shot? Dustin P.S. I hope the fact that this script is in Perl was helpful to you, since you were able to make changes while debugging it. That is one of the aims of the Perl rewrite! -- Open Source Storage Engineer http://www.zmanda.com
