I have been thinking about why tape drives shoeshine, and I
have come to the following conclusions. Please tell me what
you think. Am I right, or do I have this all wrong? This seems
to work on my computer.
It seems to me that shoeshining (the tape keeps going back and
forth, back and forth, back and forth) occurs when the computer
can not transfer data to or from the tape drive fast enough.
In other words, the tape drive is faster than the computer.
Usually the computer is processing the data while it is
transferring it, and the processing is the slow part,
especially if the computer is compressing the data. Therefore,
I think the way to solve the shoeshining problem is to seperate
the processing from the transferring. Usually the archive
program will process one block of data, transfer one block,
process one block, transfer one block, etc. Instead, create a
spooling buffer. Let the archive program transfer data into the
spooling buffer as usual. Wait until the spooling buffer fills,
and then transfer the whole spooling buffer at once.
For example (the following is supposed to be one line):
[write archive to STDOUT] | perl -e 'while(read(STDIN,$A,
[spool buffer size])){print($A)}' > [tape device]
To read the tape:
perl -e 'while(read(STDIN,$A,[spool buffer size]))
{print($A)}' < [tape device] | [read archive from STDIN]
That would work with variable block size, but with fixed block
size you would have to pipe the archive through dd. In the next
example the block size is 10240 and the spooling buffer size is
1000000:
tar -c . | gzip -9 | perl -e 'while(read(STDIN,$A,1000000))
{print($A)}' | dd bs=10240 conv=sync > /dev/tape
and
dd bs=10240 < /dev/tape | perl -e 'while(read(STDIN,$A,
1000000)){print($A)}' | gzip -d | tar -x
This little perl program is very simple because it lets the
kernel do the hard part. When this perl program stops reading
from the pipe, the kernel stops the program which is writing
data to the ty of cpu cycles available for
feeding data to or from the tape.
If the size of the archive is smaller than the size of the
spooling buffer, then the actual size of the spooling buffer
will be the size of the archive; but perl will allocate enough
memory for the maximum size of the spooling buffer; so it is an
inefficient use of memory to have the size of the spooling
buffer much larger than the size of the archive.
Using a spooling buffer will slow you down if try to read a few
bytes from a large archive.
If you are using swap memory, then part of the spooling buffer
might be swapped; you might be able to speed
up access to the spooling buffer by turning swap off.
And here is a perl script named 'SpoolBlock' which does both
the spooling buffer and the block size:
#!/usr/bin/perl
use integer;
if ( $ARGV[0] == 0 or $ARGV[1] == 0 ) {
print(STDERR "
SpoolBlock: incorrect command line parameters, should be:
SpoolBlock [spooling buffer size] [block size]
spooling buffer size must be a multiple of block size
");
exit(1);
}
if ( ($ARGV[0] % $ARGV[1]) != 0 ) {
print(STDERR "
SpoolBlock: spooling buffer size must be a multiple of block
size. spooling buffer size is $ARGV[0]; block size is $ARGV[1]
");
exit(1);
}
$NB = 1;
until ( $NB < 1 ) {
$ReadCount = 0;
$WriteCount = 0;
$DataBuffer = '';
until ( $NB < 1 or $ReadCount >= $ARGV[0] ) {
$NB = sysread(STDIN,$DataBuffer,$ARGV[1],$ReadCount);
if ( $NB != 0 ) { $ReadCount = $ReadCount + $ARGV[1] }
}
if ( $NB < 1 ) {
$NA = $ReadCount - length($DataBuffer);
if ( $NA > 0 ) { $DataBuffer = $DataBuffer . ("\0" x $NA) }
}
until ( $WriteCount >= $ReadCount ) {
syswrite(STDOUT,$DataBuffer,$ARGV[1],$WriteCount);
$WriteCount = $WriteCount + $ARGV[1];
}
}
__END__
eof() does not work right with sysread(), so this program
detects end of file by checking if $NB = 0
'$NB < 1' could be '$NB == 0', '$WriteCount >= $ReadCount'
could be '$WriteCount == $ReadCount', and
'$ReadCount >= $ARGV[0]' could be '$ReadCount == $ARGV[0]'.
But I am a little paranoid about bugs; by doing it this way I
have reduced the chance that a bug will cause an endless loop.
And I think it goes just as fast this way.
In some cases shoeshining is caused by a bad tape;
the tape drive writes or reads the same segment repeatedly
because it is having trouble writing or reading the data. A
spooling buffer would not help in this case.
Computers are a lot faster than they used to be, while tape
drives are only a little faster than they used to be. As a
result, shoeshining is less of a problem than it used to be.