Re: [DRBD-user] Real live risk of data loss w/o flush

Robert Verspuy Wed, 08 Sep 2010 09:03:42 -0700

 On 09/08/2010 03:49 PM, Lars Ellenberg wrote:

On Thu, Sep 02, 2010 at 03:22:25PM +0200, Robert Verspuy wrote:


On the database server we're using PostgreSQL.
PostgreSQL is ACID-compliant, so the data on disk should not be corrupt.
It could be possible that we lost some database insert/updates,
but that's a risk I'm willing to accept, looking at the small change
that all power is lost.

Excuse me, but WHAT?

PostgreSQL is ACID compliant, IF AND ONLY IF the fsync/fdatasync and
similar it issues are behaving as expected, i.e. data is on stable
storage when PostgreSQL thinks it is.

Hmm. Yes you are right. I think I was a bit too fast in thinking,everything will be fine.I though that no-disk-flushes would make drbd to not add it's ownflushes after every IO,but still accept and push through the flushes that came from the layerabove the drbd device.

But, as I understand, drbd will not do any flushes when no-disk-flush isset. Not it's own flushes, and also not the flush requests it gets fromthe layer above.

If data only reaches stable storage at some point after PostgreSQL
thinks it already was there, and most likely even in some random order,
then no, ACID compliance is not met.

Ok, together with your other mail, I think I understand it now.

So, I think there are two risks when using volatile caches withno-disk-barrier and no-disk-flushes and protocol C.

First -> single node failure, there can be difference in what isactually on disk.After recovery, let the crashed node be the secondary and run a verifyas soon as possible.If the now primary node crashes before the verify is done, you'll mustrestore the database from a backup.

Second -> both nodes have a crash / power failure. This way, it'spossible both nodes have corrupt data.

Solution: restore a backup of the database.

So in any case (just like when running postgresql on one server), yourdata loss is always limited to your last regular backup of the database.

The reason for me to test with no-disk-barrier and no-disk-flushes isbecause of the big latency (25ms in stead of the expected 1 or 2 ms)when writing small blocks of data.(See also my e-mail from last week, asking directions where to startlooking to find the what's causing the latency)

So no, if you run PostgreSQL on disks with volatile caches,
and you unplug the power hard, you can expect data loss
and possibly data corruption.

Which is completely independend of DRBD.

True,

So when comparing:

postgresql on one server, with it's own disk flushes and volatile caches
against

postgresql on two nodes with drbd, with no-disk-barrier, no-disk-flushedand volatile caches,

then it's (looking at data loss / corruption) it's safer to runpostgresql on one server, because of the disk flushes.


Unless we find the cause and maybe a solution for the huge latency,
then I can remove the no-disk-barrier, no-disk-flushes parameters.

With kind regards,
Robert Verspuy

--
*Exa-Omicron*
Patroonsweg 10
3892 DB Zeewolde
Tel.: 088-OMICRON (66 427 66)
http://www.exa-omicron.nl

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Real live risk of data loss w/o flush

Reply via email to