On 09/08/2010 03:49 PM, Lars Ellenberg wrote:
On Thu, Sep 02, 2010 at 03:22:25PM +0200, Robert Verspuy wrote:
On the database server we're using PostgreSQL.
PostgreSQL is ACID-compliant, so the data on disk should not be corrupt.
It could be possible that we lost some database insert/updates,
but that's a risk I'm willing to accept, looking at the small change
that all power is lost.
Excuse me, but WHAT?
PostgreSQL is ACID compliant, IF AND ONLY IF the fsync/fdatasync and
similar it issues are behaving as expected, i.e. data is on stable
storage when PostgreSQL thinks it is.
Hmm. Yes you are right. I think I was a bit too fast in thinking,
everything will be fine.
I though that no-disk-flushes would make drbd to not add it's own
flushes after every IO,
but still accept and push through the flushes that came from the layer
above the drbd device.
But, as I understand, drbd will not do any flushes when no-disk-flush is
set. Not it's own flushes, and also not the flush requests it gets from
the layer above.
If data only reaches stable storage at some point after PostgreSQL
thinks it already was there, and most likely even in some random order,
then no, ACID compliance is not met.
Ok, together with your other mail, I think I understand it now.
So, I think there are two risks when using volatile caches with
no-disk-barrier and no-disk-flushes and protocol C.
First -> single node failure, there can be difference in what is
actually on disk.
After recovery, let the crashed node be the secondary and run a verify
as soon as possible.
If the now primary node crashes before the verify is done, you'll must
restore the database from a backup.
Second -> both nodes have a crash / power failure. This way, it's
possible both nodes have corrupt data.
Solution: restore a backup of the database.
So in any case (just like when running postgresql on one server), your
data loss is always limited to your last regular backup of the database.
The reason for me to test with no-disk-barrier and no-disk-flushes is
because of the big latency (25ms in stead of the expected 1 or 2 ms)
when writing small blocks of data.
(See also my e-mail from last week, asking directions where to start
looking to find the what's causing the latency)
So no, if you run PostgreSQL on disks with volatile caches,
and you unplug the power hard, you can expect data loss
and possibly data corruption.
Which is completely independend of DRBD.
True,
So when comparing:
postgresql on one server, with it's own disk flushes and volatile caches
against
postgresql on two nodes with drbd, with no-disk-barrier, no-disk-flushed
and volatile caches,
then it's (looking at data loss / corruption) it's safer to run
postgresql on one server, because of the disk flushes.
Unless we find the cause and maybe a solution for the huge latency,
then I can remove the no-disk-barrier, no-disk-flushes parameters.
With kind regards,
Robert Verspuy
--
*Exa-Omicron*
Patroonsweg 10
3892 DB Zeewolde
Tel.: 088-OMICRON (66 427 66)
http://www.exa-omicron.nl
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user