[ On Saturday, June 15, 2002 at 00:45:12 (+0200), Ragnar Kj�rstad wrote: ]
> Subject: Re: Backing up PostgreSQL?
>
> snapshots (LVM snapshots) are not "supposedly nearly instantaneous", but
> "instantaneous". All write-access to the device is _locked_ while the
> snapshot is in progress (the process takes a few milliseconds, maybe a
> second on a big system), and there are _no_ races. That's the whole
> point of the snapshot!

That's irrelevant from PostgreSQL's point of view.  There's no sure way
to tell the postgresql process(es) to make the on-disk database image
consistent before you create the snapshot.  The race condition is
between the user-level process and the filesystem.  The only sure way to
guarantee a self-consistent backup is to shut down the process so as to
ensure all the files it had open are now closed.  PostgreSQL makes no
claims that all data necessary to present a continually consistent view
of the DB will be written in a single system call.  In fact this is
impossible since the average database consistes of many files and you
can only write to one file at a time through the UNIX system interface.
If your snapshot occurs between two writes where both are necessary to
maintain DB consistency then the snapshot is in an inconsistent state.
Yes there are other ways to recover consistency using other protection
mechanisms maintained by PostgreSQL, but you don't want to be relying on
those when you're doing backups -- you barely want to rely on those when
you're doing crash recovery!

If doing a snapshot really is that fast then there's almost no excuse
_not_ to stop the DB -- just do it!  Your DB downtime will not be
noticable.

> To postgresql (or any other application) the rollback of an snapshot
> (or the backup of a snapshot) will be exactly like recovering from a
> crash. Database-servers need to write the data to disk (and fsync)
> before the transaction is completed. In practise they don't actually
> write it to the database-files but to a log, but that doesn't change the
> point. 
> 
> So, the only advantage of shutting down the database is that it doesn't
> have to recover like from a crash.

Newer releases of PostgreSQL don't always use fsync().  I wouldn't trust
recovery to be consistent without any loss implicitly.  Because
PostgreSQL uses the filesystem (and not raw disk) the only way to be
100% certain that what's written to disk is a consistent view of the DB
is to close all the open DB files.

You don't want the state of your backups to appear as if the system had
crashed -- you want them to be fully self-consistent.  At least that's
what you want if you care about your data _and_ your application, and
you care about getting a restored system back online ASAP.  

Of course if you really care most about your data in and of itself then
you'll be doing pg_dump to back it up -- quick restoration of a given
physical system isn't always what's most important!

Maybe both are important -- do a pg_dump, back up the dump file, then
stop the DB, do a snapshot, restart the DB, and then back up the
snapshot too.

-- 
                                                                Greg A. Woods

+1 416 218-0098;  <[EMAIL PROTECTED]>;  <[EMAIL PROTECTED]>;  <[EMAIL PROTECTED]>
Planix, Inc. <[EMAIL PROTECTED]>; VE3TCP; Secrets of the Weird <[EMAIL PROTECTED]>

Reply via email to