On Thu, Jun 13, 2002 at 05:33:31PM -0500, Brandon D. Valentine wrote:
> >The easiest way to snapshot the filesystem is to use a logical volume
> >manager (LVM or EVMS on linux) and then do:
> >1. take database offline
> >2. take snapshot
> >3. take database online
> >4. backup from snapshot
> >5. remove snapshot
> 
> I would like to comment that while this is a possible way to backup your
> database, it's not the way I would recommend going about it.  There are
> a couple of caveats:
> 
> 1) In order to take a filesystem snapshot you must have enough diskspace
> elsewhere to contain the filesystem snapshot.  If your database resides
> on a large filesystem with other data[0] then you're unlikely to want to
> deal with caching an entire snapshot until amanda backs it up.

You need extra space with both approaches (pg_dump and snapshot). Which
solution requires the most space will depend on many factors, e.g. how
much you write you your database. If it's mostly read-only, the snapshot
will not require much space.

> 2) Backing up a database by grabbing the actual files off of the disk
> can introduce problems if trying to restore onto, for instance, an
> upgraded version of Postgres, which might have changed the ondisk
> representation slightly.  There are also problems if you migrate to a
> different architecture since things like byte ordering can change.  By
> generating a database dump with pg_dump you insure that you have a
> portable, plain text file full of valid query commands which can be read
> into any future version of Postgres and possibly even into other RDMBS
> products provided you choose a pg_dump format which is standards
> complaint enough.

Yes, this is a backup-solution, not a migration-solution.

pg_dumps can not always be imported into newer postgresql versions
without modifications.

> 3) If your snapshot code is not atomic it means you must take your
> database server down everytime you make the snapshot, which on a large
> filesystem could be a non-trivial amount of time.  With pg_dump you're
> just dumping the tables via the standard Postgres interface so you've
> got no issues with doing it on a running database.

Hmm, is the pg_dump consistent?
IOW, is it done in a single transaction? (even for multiple databases?)
If yes, would a very long-running pg_dump not cause problems for the
running server? I know postgresql doesn't lock whole tables, but it
means that if data changes postgresql needs to keep two branches, and it
will require extra diskspace and I suppose also introduce overhead to
other processes?

Of course, if the database is mostly read-only this is just a minor
problem, but that's true for snapshot-backups as well.



-- 
Ragnar Kj�rstad
Big Storage

Reply via email to