Karsten Hilbert wrote: > On Wed, May 03, 2006 at 01:04:08AM -0700, Jim Busser wrote: >> - can Syan or Karsten, if the work involved is limited, attempt a >> load test of writing the dump file, so we might know how long this >> would require and what its size might be... > This is entirely dependant on the database content. The dump > file may grow huge, indeed (several GB, eventually) but > could be fed into compression right away or streamed onto > tape or whatever.
One of our PG dtabases has a few million rows in the main table and occupies about 10GB of disc space. The resulting PG dump file, in compressed binary format, is about 600MB and takes about 20 minutes to write to disc. However, a test restore, albeit on a 2.4 GHz P4 laptop with 1GB of memory and a typically slow laptop disc, may take 4 or 5 hours. >> this is pertinent to >> office procedures and downtime > No it's not. That's what PG is MVCC for - and that's why we > *chose* PG in the first place. In other words, PG allows a backup dump to be safely taken of a live database, which is in continuous use, by noting the state of the database at the time the dump is started, and only including versions of records as they were at that time. It can do that because it is able to keep track of multiple multiple versions of records at once (hence the acronym Multiple Version Concurrency Control). Thus, if: a) pg_dump starts at 22:15:00.00 hrs b) records for patient Joe Bloggs are updated by GNUmed user 5 seconds later at 22:15:05.00 hrs c) pg_dump finishes at 22:31:56.78 hrs then the pg_dump will contain the records for Joe Bloggs as they were when the pg_dump started i.e. **before** the update at b). However anyone accessing the database at, say, 22:16:00.00 hrs - i.e. 1 minute later, will see the records for Joe Bloggs which include updates done in b). Because a db_dump is running, PG knows it has to keep both versions of Joe Bloggs' records around, at least until the db_dump is completed. I think that only Oracle, MS SQL Server and PG offer MVCC and hence true hot backups. MVCC also allows multiple transactions to proceed in parallel, which helps throughput in normal use, not just when backups are taken. Other databases have to lock either all or large parts of the database during the back-up dump, which can indeed interfere with users of the database and office workflow as you point out. Tim C _______________________________________________ Gnumed-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/gnumed-devel
