Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-11-07 Thread Bruce Momjian
On Sat, Oct 27, 2012 at 05:41:02PM +1100, Chris Angelico wrote:
 On Sat, Oct 27, 2012 at 4:26 PM, Greg Smith g...@2ndquadrant.com wrote:
  In general, through, diskchecker.pl is the more sensitive test.  If it
  fails, storage is unreliable for PostgreSQL, period.   It's good that you've
  followed up by confirming the real database corruption implied by that is
  also visible.  In general, though, that's not needed. Diskchecker says the
  drive is bad, you're done--don't put a database on it.  Doing the database
  level tests is more for finding false positives:  where diskchecker says the
  drive is OK, but perhaps there is a filesystem problem that makes it
  unreliable, one that it doesn't test for.
 
 Thanks. That's the conclusion we were coming to too, though all I've
 seen is lost transactions and not any other form of damage.
 
  What SSD are you using?  The Intel 320 and 710 series models are the only
  SATA-connected drives still on the market I know of that pass a serious
  test.  The other good models are direct PCI-E storage units, like the
  FusionIO drives.
 
 I don't have the specs to hand, but one of them is a Kingston drive.
 Our local supplier is out of 320 series drives, so we were looking for
 others; will check out the 710s. It's crazy that so few drives can
 actually be trusted.

Yes.  Welcome to our craziness!

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-11-07 Thread Scott Marlowe
On Wed, Nov 7, 2012 at 11:59 AM, Bruce Momjian br...@momjian.us wrote:
 On Sat, Oct 27, 2012 at 05:41:02PM +1100, Chris Angelico wrote:
 On Sat, Oct 27, 2012 at 4:26 PM, Greg Smith g...@2ndquadrant.com wrote:
  In general, through, diskchecker.pl is the more sensitive test.  If it
  fails, storage is unreliable for PostgreSQL, period.   It's good that 
  you've
  followed up by confirming the real database corruption implied by that is
  also visible.  In general, though, that's not needed. Diskchecker says the
  drive is bad, you're done--don't put a database on it.  Doing the database
  level tests is more for finding false positives:  where diskchecker says 
  the
  drive is OK, but perhaps there is a filesystem problem that makes it
  unreliable, one that it doesn't test for.

 Thanks. That's the conclusion we were coming to too, though all I've
 seen is lost transactions and not any other form of damage.

  What SSD are you using?  The Intel 320 and 710 series models are the only
  SATA-connected drives still on the market I know of that pass a serious
  test.  The other good models are direct PCI-E storage units, like the
  FusionIO drives.

 I don't have the specs to hand, but one of them is a Kingston drive.
 Our local supplier is out of 320 series drives, so we were looking for
 others; will check out the 710s. It's crazy that so few drives can
 actually be trusted.

 Yes.  Welcome to our craziness!

Is there a comprehensive list of drives that have been tested on the
wiki somewhere?  Our current choices seem to be the Intel 3xx series
which STILL suffer from the whoops I'm now an 8MB drive bug and the
very expensive SLC 7xx series Intel drives, the Hitachi Ultrastar
SSD400M, and the OCZ Vertex 2 Pro.  Any particular recommendations
from those or other series from anyone would be greatly appreciated.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-11-07 Thread Bruce Momjian
On Wed, Nov  7, 2012 at 01:53:47PM -0700, Scott Marlowe wrote:
 On Wed, Nov 7, 2012 at 11:59 AM, Bruce Momjian br...@momjian.us wrote:
  On Sat, Oct 27, 2012 at 05:41:02PM +1100, Chris Angelico wrote:
  On Sat, Oct 27, 2012 at 4:26 PM, Greg Smith g...@2ndquadrant.com wrote:
   In general, through, diskchecker.pl is the more sensitive test.  If it
   fails, storage is unreliable for PostgreSQL, period.   It's good that 
   you've
   followed up by confirming the real database corruption implied by that is
   also visible.  In general, though, that's not needed. Diskchecker says 
   the
   drive is bad, you're done--don't put a database on it.  Doing the 
   database
   level tests is more for finding false positives:  where diskchecker says 
   the
   drive is OK, but perhaps there is a filesystem problem that makes it
   unreliable, one that it doesn't test for.
 
  Thanks. That's the conclusion we were coming to too, though all I've
  seen is lost transactions and not any other form of damage.
 
   What SSD are you using?  The Intel 320 and 710 series models are the only
   SATA-connected drives still on the market I know of that pass a serious
   test.  The other good models are direct PCI-E storage units, like the
   FusionIO drives.
 
  I don't have the specs to hand, but one of them is a Kingston drive.
  Our local supplier is out of 320 series drives, so we were looking for
  others; will check out the 710s. It's crazy that so few drives can
  actually be trusted.
 
  Yes.  Welcome to our craziness!
 
 Is there a comprehensive list of drives that have been tested on the
 wiki somewhere?  Our current choices seem to be the Intel 3xx series
 which STILL suffer from the whoops I'm now an 8MB drive bug and the
 very expensive SLC 7xx series Intel drives, the Hitachi Ultrastar
 SSD400M, and the OCZ Vertex 2 Pro.  Any particular recommendations
 from those or other series from anyone would be greatly appreciated.

No, I know of no official list.  Greg Smith and I have tried to document
some of this on the wiki:

http://wiki.postgresql.org/wiki/Reliable_Writes

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-11-07 Thread Scott Marlowe
On Wed, Nov 7, 2012 at 2:01 PM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Nov  7, 2012 at 01:53:47PM -0700, Scott Marlowe wrote:
 On Wed, Nov 7, 2012 at 11:59 AM, Bruce Momjian br...@momjian.us wrote:
  On Sat, Oct 27, 2012 at 05:41:02PM +1100, Chris Angelico wrote:
  On Sat, Oct 27, 2012 at 4:26 PM, Greg Smith g...@2ndquadrant.com wrote:
   In general, through, diskchecker.pl is the more sensitive test.  If it
   fails, storage is unreliable for PostgreSQL, period.   It's good that 
   you've
   followed up by confirming the real database corruption implied by that 
   is
   also visible.  In general, though, that's not needed. Diskchecker says 
   the
   drive is bad, you're done--don't put a database on it.  Doing the 
   database
   level tests is more for finding false positives:  where diskchecker 
   says the
   drive is OK, but perhaps there is a filesystem problem that makes it
   unreliable, one that it doesn't test for.
 
  Thanks. That's the conclusion we were coming to too, though all I've
  seen is lost transactions and not any other form of damage.
 
   What SSD are you using?  The Intel 320 and 710 series models are the 
   only
   SATA-connected drives still on the market I know of that pass a serious
   test.  The other good models are direct PCI-E storage units, like the
   FusionIO drives.
 
  I don't have the specs to hand, but one of them is a Kingston drive.
  Our local supplier is out of 320 series drives, so we were looking for
  others; will check out the 710s. It's crazy that so few drives can
  actually be trusted.
 
  Yes.  Welcome to our craziness!

 Is there a comprehensive list of drives that have been tested on the
 wiki somewhere?  Our current choices seem to be the Intel 3xx series
 which STILL suffer from the whoops I'm now an 8MB drive bug and the
 very expensive SLC 7xx series Intel drives, the Hitachi Ultrastar
 SSD400M, and the OCZ Vertex 2 Pro.  Any particular recommendations
 from those or other series from anyone would be greatly appreciated.

 No, I know of no official list.  Greg Smith and I have tried to document
 some of this on the wiki:

 http://wiki.postgresql.org/wiki/Reliable_Writes

Well I may get a budget at work to do some testing so I'll update that
list etc.  This has been a good thread to get me motivated to get
started.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-11-07 Thread Bruce Momjian
On Wed, Nov  7, 2012 at 02:12:39PM -0700, Scott Marlowe wrote:
   I don't have the specs to hand, but one of them is a Kingston drive.
   Our local supplier is out of 320 series drives, so we were looking for
   others; will check out the 710s. It's crazy that so few drives can
   actually be trusted.
  
   Yes.  Welcome to our craziness!
 
  Is there a comprehensive list of drives that have been tested on the
  wiki somewhere?  Our current choices seem to be the Intel 3xx series
  which STILL suffer from the whoops I'm now an 8MB drive bug and the
  very expensive SLC 7xx series Intel drives, the Hitachi Ultrastar
  SSD400M, and the OCZ Vertex 2 Pro.  Any particular recommendations
  from those or other series from anyone would be greatly appreciated.
 
  No, I know of no official list.  Greg Smith and I have tried to document
  some of this on the wiki:
 
  http://wiki.postgresql.org/wiki/Reliable_Writes
 
 Well I may get a budget at work to do some testing so I'll update that
 list etc.  This has been a good thread to get me motivated to get
 started.

Yes, it seems database people are the few who care about device sync
reliability (or know to care).

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-11-07 Thread Vick Khera
On Wed, Nov 7, 2012 at 3:53 PM, Scott Marlowe scott.marl...@gmail.comwrote:

 Is there a comprehensive list of drives that have been tested on the
 wiki somewhere?  Our current choices seem to be the Intel 3xx series
 which STILL suffer from the whoops I'm now an 8MB drive bug and the
 very expensive SLC 7xx series Intel drives, the Hitachi Ultrastar
 SSD400M, and the OCZ Vertex 2 Pro.  Any particular recommendations
 from those or other series from anyone would be greatly appreciated.


My most recent big box(es) are built using all Intel 3xx series drives.
Like you said, the 7xx series was way too expensive.  The 5xx series looks
totally right on paper, until you find out they don't have a durable cache.
 That just doesn't make sense in any universe... but that's the way they
are.

They seem to be doing really well so far.  I connected them to LSI RAID
controllers, with the Fastpath option.  I think they are pretty speedy.

On my general purpose boxes, I now spec the 3xx drives for boot (software
RAID) and use other drives such as Seagate Constellation for data with ZFS.
Sometimes I think that the ZFS volumes are faster than the SSD RAID
volumes, but it is not a fair comparison because the RAID systems are
CentOS 6 and the ZFS systems are FreeBSD 9.


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-11-07 Thread David Boreham

On 11/7/2012 3:17 PM, Vick Khera wrote:
My most recent big box(es) are built using all Intel 3xx series 
drives. Like you said, the 7xx series was way too expensive.


I have to raise my hand to say that for us 710 series drives are an 
unbelievable bargain and we buy nothing else now for production servers.
When you compare vs the setup you'd need to achieve the same tps using 
rotating media, and especially considering the power and cooling saved, 
they're really cheap. YMMV of course..







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-27 Thread Chris Angelico
On Sat, Oct 27, 2012 at 4:26 PM, Greg Smith g...@2ndquadrant.com wrote:
 In general, through, diskchecker.pl is the more sensitive test.  If it
 fails, storage is unreliable for PostgreSQL, period.   It's good that you've
 followed up by confirming the real database corruption implied by that is
 also visible.  In general, though, that's not needed. Diskchecker says the
 drive is bad, you're done--don't put a database on it.  Doing the database
 level tests is more for finding false positives:  where diskchecker says the
 drive is OK, but perhaps there is a filesystem problem that makes it
 unreliable, one that it doesn't test for.

Thanks. That's the conclusion we were coming to too, though all I've
seen is lost transactions and not any other form of damage.

 What SSD are you using?  The Intel 320 and 710 series models are the only
 SATA-connected drives still on the market I know of that pass a serious
 test.  The other good models are direct PCI-E storage units, like the
 FusionIO drives.

I don't have the specs to hand, but one of them is a Kingston drive.
Our local supplier is out of 320 series drives, so we were looking for
others; will check out the 710s. It's crazy that so few drives can
actually be trusted.

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-26 Thread Greg Smith

On 10/24/12 4:04 PM, Chris Angelico wrote:


Is this a useful and plausible testing methodology? It's definitely
showed up some failures. On a hard-disk, all is well as long as the
write-back cache is disabled; on the SSDs, I can't make them reliable.


On Linux systems, you can tell when Postgres is busy writing data out 
during a checkpoint because the Dirty: amount will be dropping 
rapidly.  At most other times, that number goes up.  You can try to 
increase the odds of finding database level corruption during a pull the 
plug test by trying to yank during that most sensitive moment.  Combine 
a reasonable write-heavy test like you've devised with that 
optimization, and systems that don't write reliably will usually 
corrupt within a few tries.


In general, through, diskchecker.pl is the more sensitive test.  If it 
fails, storage is unreliable for PostgreSQL, period.   It's good that 
you've followed up by confirming the real database corruption implied by 
that is also visible.  In general, though, that's not needed. 
Diskchecker says the drive is bad, you're done--don't put a database on 
it.  Doing the database level tests is more for finding false positives: 
 where diskchecker says the drive is OK, but perhaps there is a 
filesystem problem that makes it unreliable, one that it doesn't test for.


What SSD are you using?  The Intel 320 and 710 series models are the 
only SATA-connected drives still on the market I know of that pass a 
serious test.  The other good models are direct PCI-E storage units, 
like the FusionIO drives.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-24 Thread Chris Angelico
On Tue, Oct 23, 2012 at 9:51 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Oct 22, 2012 at 7:17 AM, Chris Angelico ros...@gmail.com wrote:
 After reading the comments last week about SSDs, I did some testing of
 the ones we have at work - each of my test-boxes (three with SSDs, one
 with HDD) subjected to multiple stand-alone plug-pull tests, using
 pgbench to provide load. So far, there've been no instances of
 PostgreSQL data corruption, but diskchecker.pl reported huge numbers
 of errors.

 Try starting pgbench, and then halfway through the timeout for a
 checkpoint timeout issue a checkpoint and WHILE the checkpoint is
 still running THEN pull the plug.

 Then after bringing the server up (assuming pg starts up) see if
 pg_dump generates any errors.

Thanks for the tip. I've been flat-out at work these past few days and
haven't gotten around to testing in the middle of a checkpoint, but I
have done something that might also be of interest. It's inspired by a
combination of diskchecker and pgbench; a harness that puts the
database under load and retains a record of what's been done.

In brief: Create a table with N (eg 100) rows, then spin as fast as
possible, incrementing a counter against one random row and also
incrementing the Total counter. When the database goes down, wait
for it to come up again; when it does, check against the local copy of
the counters and report any discrepancies.

The code's written in Pike, using the same database connection logic
that we use in our actual application (well, some of our code is C++
and some is PHP, so this corresponds to one part of our app), so this
is roughly representative of real usage.

It's about a page or two of code: http://pastebin.com/UNTj642Y

Currently, all the key parameters (database connection info (which has
been censored for the pastebin version), pool size, thread count, etc)
are just variables visible in the script, simpler than parsing
command-line arguments.

Is this a useful and plausible testing methodology? It's definitely
showed up some failures. On a hard-disk, all is well as long as the
write-back cache is disabled; on the SSDs, I can't make them reliable.

Is a single table enough to test for corruption with?

Chris Angelico


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-24 Thread Scott Marlowe
On Wed, Oct 24, 2012 at 8:04 AM, Chris Angelico ros...@gmail.com wrote:
 On Tue, Oct 23, 2012 at 9:51 AM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Mon, Oct 22, 2012 at 7:17 AM, Chris Angelico ros...@gmail.com wrote:
 After reading the comments last week about SSDs, I did some testing of
 the ones we have at work - each of my test-boxes (three with SSDs, one
 with HDD) subjected to multiple stand-alone plug-pull tests, using
 pgbench to provide load. So far, there've been no instances of
 PostgreSQL data corruption, but diskchecker.pl reported huge numbers
 of errors.

 Try starting pgbench, and then halfway through the timeout for a
 checkpoint timeout issue a checkpoint and WHILE the checkpoint is
 still running THEN pull the plug.

 Then after bringing the server up (assuming pg starts up) see if
 pg_dump generates any errors.

 Thanks for the tip. I've been flat-out at work these past few days and
 haven't gotten around to testing in the middle of a checkpoint, but I
 have done something that might also be of interest. It's inspired by a
 combination of diskchecker and pgbench; a harness that puts the
 database under load and retains a record of what's been done.

 In brief: Create a table with N (eg 100) rows, then spin as fast as
 possible, incrementing a counter against one random row and also
 incrementing the Total counter. When the database goes down, wait
 for it to come up again; when it does, check against the local copy of
 the counters and report any discrepancies.

 The code's written in Pike, using the same database connection logic
 that we use in our actual application (well, some of our code is C++
 and some is PHP, so this corresponds to one part of our app), so this
 is roughly representative of real usage.

 It's about a page or two of code: http://pastebin.com/UNTj642Y

Very cool.  Nice little project.

 Currently, all the key parameters (database connection info (which has
 been censored for the pastebin version), pool size, thread count, etc)
 are just variables visible in the script, simpler than parsing
 command-line arguments.

 Is this a useful and plausible testing methodology? It's definitely
 showed up some failures. On a hard-disk, all is well as long as the
 write-back cache is disabled; on the SSDs, I can't make them reliable.

Yes it seems to be quite a good idea actually.

 Is a single table enough to test for corruption with?

If it fails, definitely, if it passes maybe.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-22 Thread Chris Angelico
After reading the comments last week about SSDs, I did some testing of
the ones we have at work - each of my test-boxes (three with SSDs, one
with HDD) subjected to multiple stand-alone plug-pull tests, using
pgbench to provide load. So far, there've been no instances of
PostgreSQL data corruption, but diskchecker.pl reported huge numbers
of errors.

What exactly does this mean? Is Postgres doing something that
diskchecker isn't, and is thus safe? Could data corruption occur but
I've just never pulled the power out at the precise microsecond when
it would cause problems? Or is it that we would lose entire
transactions, but never experience corruption that the postmaster
can't repair?

Interestingly, disabling write-caching with 'hdparm -W 0 /dev/sda' (as
per the llivejournal blog[1]) reduced the SSD's error rates without
eliminating failures entirely, while on the HDD, there were no
problems at all with write caching off.

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-22 Thread Jeff Janes
On Mon, Oct 22, 2012 at 6:17 AM, Chris Angelico ros...@gmail.com wrote:
 After reading the comments last week about SSDs, I did some testing of
 the ones we have at work - each of my test-boxes (three with SSDs, one
 with HDD) subjected to multiple stand-alone plug-pull tests, using
 pgbench to provide load. So far, there've been no instances of
 PostgreSQL data corruption, but diskchecker.pl reported huge numbers
 of errors.

What did you do to look for corruption?  That PosgreSQL succeeds at
going through crash-recovery and then starting up is not a good
indicator that there is no corruption.

Did you do something like compute the aggregates on pgbench_history
and compare those aggregates to the balances in the other 3 tables?

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-22 Thread Chris Angelico
On Tue, Oct 23, 2012 at 6:26 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 What did you do to look for corruption?  That PosgreSQL succeeds at
 going through crash-recovery and then starting up is not a good
 indicator that there is no corruption.

I fired up Postgres and looked at the logs for any signs of failure.

 Did you do something like compute the aggregates on pgbench_history
 and compare those aggregates to the balances in the other 3 tables?

No, didn't do that. My next check will be done over the network
(similar to diskchecker), with a script that fires off requests, waits
for them to be confirmed committed, and then records a local copy, and
will check that local copy once the server's back up again. That'll
tell me if transactions are being lost.

I'm kinda feeling my way in the dark here. Will check out the
aggregates on pgbench_history when I get to work today; thanks for the
tip!

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-22 Thread Jeff Janes
On Mon, Oct 22, 2012 at 12:31 PM, Chris Angelico ros...@gmail.com wrote:
 On Tue, Oct 23, 2012 at 6:26 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 What did you do to look for corruption?  That PosgreSQL succeeds at
 going through crash-recovery and then starting up is not a good
 indicator that there is no corruption.

 I fired up Postgres and looked at the logs for any signs of failure.

 Did you do something like compute the aggregates on pgbench_history
 and compare those aggregates to the balances in the other 3 tables?

 No, didn't do that. My next check will be done over the network
 (similar to diskchecker), with a script that fires off requests, waits
 for them to be confirmed committed, and then records a local copy, and
 will check that local copy once the server's back up again. That'll
 tell me if transactions are being lost.

If you like Perl, the count.pl from this message might be a useful
starting point:

http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php

It was designed to check consistency after postmaster crashes, not OS
crashes, so the checker runs on the same host as postgres does.
Obviously for pull-the-plug test, you need run it on a different host;
so all the
DBI-connect()
calls need to be changed to do that.

 I'm kinda feeling my way in the dark here. Will check out the
 aggregates on pgbench_history when I get to work today; thanks for the
 tip!

Here's an example with pgbench_accounts, the other 2 should look analogous.

select aid, abalance, count(*) from (select aid,abalance from
pgbench_accounts union all select aid, sum(delta) from pgbench_history
group by aid) as foo group by aid, abalance having abalance!=0 and
count(*)!=2;

This should return zero rows.  Any other result indicates corruption.

pgbench truncates pgbench_history, but does not reset the balances to
zero on the other tables.  So if you want to run the test repeatedly,
you have to do pgbench -i between runs, or manually reset the balance
columns.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plug-pull testing worked, diskchecker.pl failed

2012-10-22 Thread Scott Marlowe
On Mon, Oct 22, 2012 at 7:17 AM, Chris Angelico ros...@gmail.com wrote:
 After reading the comments last week about SSDs, I did some testing of
 the ones we have at work - each of my test-boxes (three with SSDs, one
 with HDD) subjected to multiple stand-alone plug-pull tests, using
 pgbench to provide load. So far, there've been no instances of
 PostgreSQL data corruption, but diskchecker.pl reported huge numbers
 of errors.

Try starting pgbench, and then halfway through the timeout for a
checkpoint timeout issue a checkpoint and WHILE the checkpoint is
still running THEN pull the plug.

Then after bringing the server up (assuming pg starts up) see if
pg_dump generates any errors.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general