[ On Sunday, June 16, 2002 at 14:42:50 (+0200), Ragnar Kj�rstad wrote: ]
> Subject: Re: Backing up PostgreSQL?
>
> The term "consistency" has no meaning by itself unless there are well
> defined rules about the allowed data-content. If the rule is that alle
> the postgresql-datafiles together holds the up-to-date status of the
> database, then it is consistant at all times. If the rule is that each
> file should hold a up-to-date and complete representation of one
> particular database-table then it's only consistant after a clean
> shutdown.
The rule should be very clear in this case: that the restoration of the
pgsql database from backup media will be completely and entirely
self-consistent and represent an exact snapshot of all the committed
transactions at the time the backup was made. It must be immediately
usable by simply starting the database as if starting it from any other
clean shutdown. If any recovery procedures are first necessary then it
is clearly not a self-consistent backup copy.
> > > The only requirement on the filesystem is that it is journaling as well,
> > > so it's always kept in a consistant state like the postgresql-database
> > > is..
> >
> > Now you're getting a little out of hand. A journaling filesystem is a
> > piling of one set of warts ontop of another. Now you've got a situation
> > where even though the filesystem might be 100% consistent even after a
> > catastrophic crash, the database won't be. There's no need to use a
> > journaling filesystem with PostgreSQL (particularly if you use a proper
> > hardware RAID subsystem with either full mirroring or full level 5
> > protection).
>
> Let's keep things clear here.
> RAID is totally irrelevant to this question. RAID protects you from
> hardware-failure, but it doesn't help to keep your filesystem or data
> consistant.
RAID systems commonly have caches. If those caches are not flushed
before a backup begins then a failure _during_ backup will possibly
loose the data in them. However a properly designed independent RAID
subsystem, in conjunction with a decently designed filesystem, will
provide all the protection necessary because metadata writes by the
operating system will be very quickly flushed to cache and the RAID
subsystem will continue to write them to disk even if the host crashes.
A journalling filesystem is not really necessary, nor of any real
benefit, because (as you admit yourself below) with pgsql there's little
metadata to futz around with on a regular basis (unless maybe you have
zillions of small tables that come and go in your schema, or something
else that causes pgsql to create and remove many files on a regular
basis).
> I have no idea why you say journaling filesystems "is a piling of one
> set of warts ontop of another", but the fact is that is required to
> always keep the filesystem consistant.
You can have a completely consistent filesystem (all metadata is
up-to-date and "fsck" or whatever will find nothing to fix) while at the
same time the database files contained within the filesystem are in an
internally inconsistent state w.r.t. the db schema and/or application
and its ongoing transactions. You're confusing your levels of
consistency.
> In theory non-journaling filesystems make no guarantees that the
> filesystem will be usable at all after a crash/powerfailure. In practise
> it's not that bad, and in most cases you will be able to recover
> everything but the last updates with fsck.
Normal unix filesystems provide more than enough metadata consistency
for pgsql and since you don't likely have billions of files you don't
need to worry that fsck will be too slow during recovery.
> PostgreSQL need a filesystem that guarantees the metadata to be
> up-to-date. E.g. when it appends to the WAL the new file-size must be
> stored to disk or the latest data will not be available after
> crash/powerfailure.
Now you're confusing your levels of metadata too.
> BUT; it's possible that the fsync() of the file
> itself will cause the metadata to be updated - at least on some
> filesystems.
I should hope that's true on all filesystems that are useful for pgsql.
> Directory-updates on the other hand are _not_ flushed to
> disk because of fsync, so e.g. when postgreSQL creates new files (when a
> table/index becomes bigger than 2 GB) you risk loosing that new file if
> the machine crashes. (unless postgreSQL does fsync() on the directory
> after creating the file - I assume it doesn't just to be safe)
It's trivial to force the OS to write directory metadata. Just close
the file and then reopen it. I don't know if pgsql does that yet, but
if not then they have something more to learn.
> In short: postgreSQL requires a journaling filesystem to guarantee
> consistant data after a crash / powerfailure.
If pgsql deletes the old file before it has ensured the new ones are on
disk then it has a very serious bug.
Journaling filesystems are not even remotely necessary to guarantee safe
operation of a properly designed database engine.
> > Indeed there are potentially performance related reasons
> > to avoid journaling filesystems! I've heard people claim they are just
> > as fast with PostgreSQL, and some even have claimed performance
> > improvements, but none of the claimants could give a scientific
> > description of their benchmark and I'm pretty sure they were seeing
> > artifacts, not a representation of real-world throughput.
>
> If you haven't seen the scientific description of this you haven't
> looked hard enough. (check out the reiserfs mailinglist archive, for
> instance)
I understand journaling filesystems quite well thank you. I do not
understand at all how any of those making claims for pgsql's performance
on such such filesystems have done their benchmarks.
> As for benchmarks that demonstrate this effect, try running postmark on
> both a journaling and non-journaling filesystem (preferably the same
> with journaling enabled/disabled). I doubt the effect is visable in
> database-benchmarks, as databases update data very frequently compared
> to metadata.
Indeed postmark is definitely not a terrific benchmark for pgsql.
However a pgsql application could be a good benchmark for pgsql.
> There are other advantages of snapshot-backups in addition to the
> recovery-time:
> * space-saving (no need for temporary space for the pg_dump)
> (unless there are lots of writes going on, of course)
You're assuming you'll always be doing recovery to the same system, no?
> * generality
> This approach works for _all_ applications, not just databases
> * consistancies between multiple databases
> (AFAIK the pg_dumpall doesn't take an atomic dump of all databases,
> so if you have a weird frontend that uses multiple databases and
> expect them to be consistant....)
This is _still_ a bogus claim, and always will be. Filesystem
consistency does not equal database consistency (at least not when the
db is stored in multiple files in the filesystem and there are any
consistency constraints between two or more of those files). You must
force a syncronisation point between your database(s) and your backup
mechanism, whatever that may be. A filesystem snapshot alone does not
do that.
> * No additional CPU-load on the database-server
Where do the resources necessary for taking the snapshot and backing it
up come from then? Thin air?
--
Greg A. Woods
+1 416 218-0098; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Planix, Inc. <[EMAIL PROTECTED]>; VE3TCP; Secrets of the Weird <[EMAIL PROTECTED]>