On Thu, Mar 7, 2013 at 7:31 PM, Bruce Momjian <br...@momjian.us> wrote: > On Mon, Mar 4, 2013 at 05:04:27PM -0800, Daniel Farina wrote: >> Putting aside the not-so-rosy predictions seen elsewhere in this >> thread about the availability of a high performance, reliable >> checksumming file system available on common platforms, I'd like to >> express what benefit this feature will have to me: >> >> Corruption has easily occupied more than one person-month of time last >> year for us. This year to date I've burned two weeks, although >> admittedly this was probably the result of statistical clustering. >> Other colleagues of mine have probably put in a week or two in >> aggregate in this year to date. The ability to quickly, accurately, >> and maybe at some later date proactively finding good backups to run >> WAL recovery from is one of the biggest strides we can make in the >> operation of Postgres. The especially ugly cases are where the page >> header is not corrupt, so full page images can carry along malformed >> tuples...basically, when the corruption works its way into the WAL, >> we're in much worse shape. Checksums would hopefully prevent this >> case, converting them into corrupt pages that will not be modified. >> >> It would be better yet if I could write tools to find the last-good >> version of pages, and so I think tight integration with Postgres will >> see a lot of benefits that would be quite difficult and non-portable >> when relying on file system checksumming. > > I see Heroku has corruption experience, and I know Jim Nasby has > struggled with corruption in the past.
More than a little: it has entered the realm of the routine, and happens frequently enough that it has become worthwhile to start looking for patterns. Our methods so far rely heavily on our archives to deal with it: it's time consuming but the 'simple' case of replaying WAL from some earlier base backup resulting in a non-corrupt database is easily the most common. Interestingly, the WAL has never failed to recover halfway through because of CRC failures while treating corruption[0]. We know this fairly convincingly because we constantly sample txid and wal positions while checking the database, as we typically do about every thirty seconds. I think this unreasonable effectiveness of this strategy of old backup and WAL replay might suggest that database checksums would prove useful. In my mind, the ways this formula could work so well if the bug was RAM or CPU based is slimmed considerably. [0] I have seen -- very rarely -- substantial periods of severe WAL corruption (files are not even remotely the correct size) propagated to the archives in the case of disaster recovery where the machine met its end because of the WAL disk being marked as dead. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers