Re: [DRBD-user] The Problem of File System Corruption w/DRBD

Eddie Chapman Thu, 03 Jun 2021 11:19:02 -0700

On 03/06/2021 13:50, Eric Robinson wrote:

-----Original Message-----
From: Digimer <[email protected]>
Sent: Wednesday, June 2, 2021 7:23 PM
To: Eric Robinson <[email protected]>; [email protected]
Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD


On 2021-06-02 5:17 p.m., Eric Robinson wrote:

Since DRBD lives below the filesystem, if the filesystem gets
corrupted, then DRBD faithfully replicates the corruption to the other
node. Thus the filesystem is the SPOF in an otherwise shared-nothing

architecture.

What is the recommended way (if there is one) to avoid the filesystem
SPOF problem when clusters are based on DRBD?

-Eric


To start, HA, like RAID, is not a replacement for backups. That is the answer
to a situation like this... HA (and other availability systems like RAID) 
protect
against component failure. If a node fails, the peer recovers automatically
and your services stay online. That's what DRBD and other HA solutions strive
to provide; uptime.

If you want to protect against corruption (accidental or intentional, a-la
cryptolockers), you need a robust backup system to _compliment_ your HA
solution.


Yes, thanks, I've said for many years that HA is not a replacement for disaster 
recovery. Still, it is better to avoid downtime than to recover from it, and 
one of the main ways to achieve that is through redundancy, preferably a 
shared-nothing approach. If I have a cool 5-node cluster and the whole thing 
goes down because the filesystem gets corrupted, I can restore from backup, but 
management is going to wonder why a 5-node cluster could not provide 
availability. So the question remains: how to eliminate the filesystem as the 
SPOF?

Some of the things being discussed here have nothing to do with drbd.drbd provides a raw block level device. It knows nothing about nor careswhat layers you place above it, whether they be filesystems or someother block layer such as LVM or bcache.

It does a very specific job; ensure the blocks you write to a drbddevice get replicated and stored in real time on one or more otherdistributed hosts. If you write a 512byte size block of random garbageto a drbd device it will (and should) write the exact same garbage tothe other distributed hosts too, so that if you read that same 512byteblock back from any 1 of those individual hosts, you'll get the exactsame garbage back.

The OP stated "if the filesystem gets corrupted, then DRBD faithfullyreplicates the corruption to the other node." Good! That's exactly whatwe want it to do. What we definitely do NOT want is for drbd tomanipulate the block data given to it in any way whatsoever, we want itto faithfully replicate this.

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
[email protected]
https://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] The Problem of File System Corruption w/DRBD

Reply via email to