Hi, Egor!
Trying to kill the keyboard, Egor ([EMAIL PROTECTED])
produced 0,9K in 26 lines:
> I have a rather busy file server with 40G of disk space. Something
> under 1G are document files that are being actively used (and
> changed), something like 1-2G is mail and the rest is data that is
> changed very rarely and doesn't need to be backed up. The backup
> device is a 20G (compressed) Travan SCSI tape drive. The tape
> cartridge can be changed only in Tuesday, Wednesday and Thursday (when
> I come here).
> What backup strategy should I choose? The streamer supports tape
> partitioning, and the number of tapes isn't a big issue.
You will want to backup the rest of the data as well, at least
every now and then, so you can restore it in case of failure.
Now I do not know how much money/trouble the data is worth (the
more it's worth, the more I'd be sure to have a good backup).
Nor do I know how much HD space you have left and if you
use LVM.[1]
Personally I would think about differential backups. I.e.
you make a complete backup of all the data on your HD every
week/2 weeks/month[2]. This is a level 0 backup[3]. *Before*
the begin of the backup you set a timestamp.0.
After that you can make, say, a level 5 backup, that means you
backup all files whose mtime *and* all files whose ctime is
newer[4] than the timestamp.0 (since there are no timestamp.4,
..3 ..2 nor ..1).
After that you could e.g. make a level 9 backup, capturing
all the changes since the last backup of the highest level --
but lower than 9 -- in this case 5.
I would try to begin each tape with a level 1 backup, so you
need only the lvl 0 tape (of which you should have more than
one, the second and third one should not change the timestamp.0,
being mere copies!), and your tape.
This being explained, let's talk about schedule. You say that
the changing data is in the order of 3 GB. The longest timespan
is 5 days (Thursday to Tuesday), which could amount to 15 GB
... more than you have space on the tape without compression.
I would advise against compression, unless you compress each
file instead of the whole backup.[5] This means you cannot make
a daily backup of the complete 3 GB, (especially if you begin
with a level 1 backup) but not every file will be changed daily.
So something like that would be what I'd try (assuming you
want the backups run in the early morning of the following
day when there is the least activity):
Tu: Remove old tape, insert new one, schedule a level 1
We: This is the day for a level 0 backup, even if run
over the daytime, since you will need at least 2 tapes,
if not 4 tapes for it.
Else schedule a level 5
Th: Change tape again. level 1
Fr: level 3 (capture just the changes on Friday)
Sa: level 5 (capture just the changes on Saturday)
Su: level 7 (capture just the changes on Sunday)
Mo: level 9 (capture just the changes on Monday)
Tu: Change tape (see above).
This is to give you the minimum of data to store, but it means
that a backup of the state of Sunday needs to run the Level
0 tape and the Thursday tape (lvl 1, 3, 5, 7)!
If you still run out of tape on Sunday/Monday you will have
to make the Thursday lvl 1 on a different tape (and do a lvl
2 in the night), but that means that your backup depends on
the level 0 tapes, the level 1 tape AND the Thursday tape!
If you want 2 backups per day, you could insert a lvl 9 in
between the regular backups.
Should you have the diskspace I'd advise you to backup to
disk first (this is usually faster than the tape, meaning less
changes between backup to disk and compare from disk) and then
migrate the backup file to tape (and again, reading it from
tape and comparing it to the image on disk). Be sure to read
the output of the backup script which files did not match the
backup (since they were changed or the data is corrupt).
-Wolfgang
[1] Especially the ability of LVM to produce snapshots while
the 'original' can be in use is a good thing for backups ...
[2] Depends on what your data is worth ... and how fast it
changes. And how you want to deal with deleted files.
My 'method' does not record a file is deleted, you'd have
to save and evaluate the output of find or similar to
handle that.
[3] this is essentially what dump does, so man dump. However I
am not sure if dump while spanning several tapes/files/parts
still corrupts the data of the file broken over the parts
... try that out first!
[4] echo "a" > a; echo "b" > b;
ls -l --full-time --time=ctime a b; ls -l --full-time a b;
sleep 10;
mv a c; mv b a; mv c b;
ls -l --full-time --time=ctime a b; ls -l --full-time a b;
cat a; cat b;
That's why ctime AND mtime.
[5] Afio does that, for example. bzip2 compresses blockwise,
but the blocksize of 100 - 900 K (-1 to -9 respectively)
is huge ... you can loose many mails in 900k.