On Thu, Jun 13, 2002 at 05:33:31PM -0500, Brandon D. Valentine wrote: > >The easiest way to snapshot the filesystem is to use a logical volume > >manager (LVM or EVMS on linux) and then do: > >1. take database offline > >2. take snapshot > >3. take database online > >4. backup from snapshot > >5. remove snapshot > > I would like to comment that while this is a possible way to backup your > database, it's not the way I would recommend going about it. There are > a couple of caveats: > > 1) In order to take a filesystem snapshot you must have enough diskspace > elsewhere to contain the filesystem snapshot. If your database resides > on a large filesystem with other data[0] then you're unlikely to want to > deal with caching an entire snapshot until amanda backs it up.
You need extra space with both approaches (pg_dump and snapshot). Which solution requires the most space will depend on many factors, e.g. how much you write you your database. If it's mostly read-only, the snapshot will not require much space. > 2) Backing up a database by grabbing the actual files off of the disk > can introduce problems if trying to restore onto, for instance, an > upgraded version of Postgres, which might have changed the ondisk > representation slightly. There are also problems if you migrate to a > different architecture since things like byte ordering can change. By > generating a database dump with pg_dump you insure that you have a > portable, plain text file full of valid query commands which can be read > into any future version of Postgres and possibly even into other RDMBS > products provided you choose a pg_dump format which is standards > complaint enough. Yes, this is a backup-solution, not a migration-solution. pg_dumps can not always be imported into newer postgresql versions without modifications. > 3) If your snapshot code is not atomic it means you must take your > database server down everytime you make the snapshot, which on a large > filesystem could be a non-trivial amount of time. With pg_dump you're > just dumping the tables via the standard Postgres interface so you've > got no issues with doing it on a running database. Hmm, is the pg_dump consistent? IOW, is it done in a single transaction? (even for multiple databases?) If yes, would a very long-running pg_dump not cause problems for the running server? I know postgresql doesn't lock whole tables, but it means that if data changes postgresql needs to keep two branches, and it will require extra diskspace and I suppose also introduce overhead to other processes? Of course, if the database is mostly read-only this is just a minor problem, but that's true for snapshot-backups as well. -- Ragnar Kj�rstad Big Storage
