James Busser wrote: > > On May 3, 2006, at 5:59 AM, Karsten Hilbert wrote: > >> On Wed, May 03, 2006 at 01:04:08AM -0700, Jim Busser wrote: > >>> - can Syan or Karsten, if the work involved is limited, attempt a >>> load test of writing the dump file, so we might know how long this >>> would require and what its size might be... >> This is entirely dependant on the database content. The dump >> file may grow huge, indeed (several GB, eventually) > > this would mean it cannot be copied onto single CDs, hopefully they > would fit a DVD else each dump would have to be split over several > pieces of media, which would be an incredible workflow pain (and > pertinent to office procedures).
You should plan to have to use DVDs eventually. If you store a lot of poorly-compressible BLOBs in the database (eg JPEG image files), then possibly quite soon. Double layer writable DVDs can hold over 8GB, though, can't they? > Also pertinent to workflow is the amount of time required for the backup > dump to be written to disk so that the media could be taken away. Sure, > it can be written instead at 0200h. It wouldn't matter provided the > dumps had been encrypted AND the purpose of that media was purely > notary, but if it may have to be the source of restoration (because > there was no slave, or because the master and slave got corrupted) then > that media would have needed to be taken away "live". So where office > activity might end at 5:30 pm daily, the amount of time required for the > dump to be generated, notarized, encrypted and written to media > (hopefully automated) becomes of great interest to the last employee who > may need to be scheduled to wait 10 vs 20 vs 40 minutes to be able to > leave, if the office policy is built around being able to take away the > media. But if the dump does not depend on a "clean" work stoppage it > could be scheduled to begin earlier, if any server performance penalty > would be tolerable. Yes, if the pg_dump takes 10 minutes and the encryption and burning to disc another 20 minutes, then just start the pg_dump at 5:00pm, and remember that patient transactions after 5pm will be on the next day's back-up. Can I mention another interesting option? Have a look at Amazon S3 (Simple Storage Service)? Amazon is offering online storage of any file or BLOB of up to 5GB in size (and any number of them as you wish, so no total data limit), accessible via simple Web APIs (and they provide a Python library to help, but it also works fine with just HTTP and curl). Cost is US$0.15 per GB per month for storage, and US$0.20 per GB for upload or download - all billed to you credit card. They promise best-effort reliability using the same distributed, highly redundant storage as Amazon uses itself. After applying for an account, you can then store data in "buckets", and control who can access those buckets (remembering that Amazon staff can potentially access everything since they run the servers). Thus you could set up a series of shell scripts which run pg_dump, encrypt it, upload it to Amazon S3 in a bucket where it remains for a month. Next day the encrypted dump is written to a different bucket on S3, and so on, until after one month the first one is overwritten (more elaborate GFS "rotation"/archiving schemes would be possible and desirable, of course). If each encrypted DB dump is about 1GB, then you are storing on average 30GB per month, and uploading 30GB - at a total cost of, um, US$10.50 per month. Not bad for off-site backup storage which is likely to be accessible 24x7 if you need to retrieve a backup to restore - in a different location, if needed. Limiting factors are your Internet upload and download speeds, which are governed by your Internet connection (plus any volume limits of charges which your Internet provider might charge - but often uploads are unmetered). If you have, say, a 512/512 symmetrical ADSL Internet connection, you should be able to upload at 150MB per hour, so even a 1GB DB dump could be uploaded overnight. I suspect your DB dumps will be much smaller than that. If anyone is interested in coding up a module in Python to handle PG dumps, encryption and uploading/downloading/overwriting of encrypted DB dump archives to Amazon S3, could they talk to me offline (hint, hint, Syan) because we would love to include such a facility in NetEpi. No reason why the same module for handling this, dual-licensed under Mozilla and GPL , couldn't be included in both NetEpi and GNUmed and/or offered as a stand-alone module which would be useful for lots of projects. Only a very small portion would be PostgreSQl-specific (just the pg_dump command), so it would be easy to allow it to handle MySQL, Interbase etc as well, and to provide a command-line/shell scripting interface as well as exposing a Python API. The chef of such would receive the people's acclaim forever, as they say on Iron Chef. Note that competing services to Amazon S3, which are expected to match the S3 pricing, will also be starting soon eg http://www.omnidrive.com, and Google are rumoured to be launching "Gdrive" soon, presumably with a scriptable API as well as a Web interface. There are lots of other online storage providers, but they have all been one or two or three orders of magnitude more expensive than Amazon S3. Of course, you can also abuse Gmail and use it to store encrypted DB dumps - Gmail allows over 2.5GB per account - but the maximum message attachment size is only 10MB, so you would need to split DB dumps over several files, and of course such use is in violation of the Gmail usage agreement and easily detectable by Google (they just need to calculate the entropy of a mail message attachment to decide whether it is an encrypted file) and they may elect to delete such files or lock accounts which contain lots of them and not much human readable email. By contrast, Amazon S3 is designed from the outset for things like encrypted backup data storage and Amazon **want** users to use it for such. Tim C _______________________________________________ Gnumed-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/gnumed-devel
