Re: Linus Torvald's opinion on Dump.

John R. Jackson Sat, 28 Apr 2001 09:34:01 -0700
[ This is way too long.  Sorry.  --JJ ]

"Carey Jung" <[EMAIL PROTECTED]> writes:

>From the gtar man page:
>
>OTHER OPTIONS
>       --atime-preserve
>              don't change access times on dumped files

Which, in turn, causes the ctime to be changed, which makes all the
files look like they need to be dumped again next time.

>Carey

Jens Bech Madsen <[EMAIL PROTECTED]> writes:

>Well, that can be avoided too. I remount my data partitions prior to
>backup to noatime (this also speeds up the estimating process a bit).

Before this goes too far (I should have clarified my first comment),
I actually more or less agree with Linus and the other posts that said
the same thing here.  Using dump on a mounted file system is problematic
because of the way it works.  But just saying "use tar" has its own
issues.  If you can live with them, fine.  If you can work around some
of them, fine.  But to just say "dump bad, tar good" is not right.

In your specific example, you've still traded off having valid access
time information for using tar.  If tar cannot update the access time,
neither can anything else, so while you're doing the backup, any access
time updates are lost.

Even if some particular OS provided an option to reset the access
time without resetting anything else, you still have a race condition.

If you want access time to be accurate, tar (or any other program that
uses the standard system I/O interface) is going to be a problem, one
way or another.

One possibly solution would be an "I'm a backup program" flag for the
process that tells the file system layers to not update metadata (such
as access time) when this process is doing I/O.  But now you're talking
about very specific OS support.

>Jens Bech Madsen

Christoph Scheeder <[EMAIL PROTECTED]> writes:

I agree with most of what you said, except:

>if you need to backup an active filesystem use a program like tar 
>which is designed to do that.

Using tar does not guarantee files are consistent on the resulting image.
Consider a file being rewritten at the time tar is running.  Some of the
data is still in the application, some is "on disk" where tar can get it.
And the break is at an OS (e.g. stdio buffer) boundary, which probably
has nothing to do with it being internally consistent.

A specific example: You're editing a fairly large source file and write
out the result.  The stdio library (or whatever) writes one data block
to the existing file, leaving the remainder with the previous data, then
is put to sleep due to kernel scheduling beyond its control.  Tar kicks
in and gets all the blocks.  You do the restore later and end up with
a mixed result.

I don't know that "the editor" works this way (it may truncate the file at
the beginning, for instance).  But the basic idea/problem is always there.

Turned around (tar starts first and then pauses), tar probably isn't
too happy with the file being truncated while it was being read, either.
You again end up with something other than useful.  Yes, the next backup
run should pick it up, but the expectation is that a restore of a given
set of images gives you back a good system.

>Christoph

"Anthony A. D. Talltree" <[EMAIL PROTECTED]> writes:

>One way is to mirror the volume, then break off a mirror and back up
>that.  Another is to use a volume snapshot, which on an active
>filesystem may take rather a long time.

Same problem as above.  The file system (mirror or snapshot) can be
inconsistent in an individual file at the block level.  Mirrors are
worse because they don't necessarily even have the data that was in the
system buffers.  You're back to the dump problem.

My point through all this is that backups involve tradeoffs, regardless
of what program is used.  You need to know what those issues are, what
you can tolerate, and how to get a backup program to live within those
boundaries.

If you can take your system down and boot from CD so all the disks are
unmounted, dump will do a perfectly good job.  If you can't do this
(and very few people can any more), then you have to know what you're
giving up and how to deal with the consequences.

As others have said, it's all a matter of knowing what the programs do,
what the limits are and how to use them properly.

John R. Jackson, Technical Software Specialist, [EMAIL PROTECTED]
Re: Linus Torvald's opinion on Dump.

Reply via email to