On 03/06/2021 13:50, Eric Robinson wrote:
-----Original Message-----
From: Digimer <[email protected]>
Sent: Wednesday, June 2, 2021 7:23 PM
To: Eric Robinson <[email protected]>; [email protected]
Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD

On 2021-06-02 5:17 p.m., Eric Robinson wrote:
Since DRBD lives below the filesystem, if the filesystem gets
corrupted, then DRBD faithfully replicates the corruption to the other
node. Thus the filesystem is the SPOF in an otherwise shared-nothing
architecture.
What is the recommended way (if there is one) to avoid the filesystem
SPOF problem when clusters are based on DRBD?

-Eric

To start, HA, like RAID, is not a replacement for backups. That is the answer
to a situation like this... HA (and other availability systems like RAID) 
protect
against component failure. If a node fails, the peer recovers automatically
and your services stay online. That's what DRBD and other HA solutions strive
to provide; uptime.

If you want to protect against corruption (accidental or intentional, a-la
cryptolockers), you need a robust backup system to _compliment_ your HA
solution.


Yes, thanks, I've said for many years that HA is not a replacement for disaster 
recovery. Still, it is better to avoid downtime than to recover from it, and 
one of the main ways to achieve that is through redundancy, preferably a 
shared-nothing approach. If I have a cool 5-node cluster and the whole thing 
goes down because the filesystem gets corrupted, I can restore from backup, but 
management is going to wonder why a 5-node cluster could not provide 
availability. So the question remains: how to eliminate the filesystem as the 
SPOF?


Some of the things being discussed here have nothing to do with drbd. drbd provides a raw block level device. It knows nothing about nor cares what layers you place above it, whether they be filesystems or some other block layer such as LVM or bcache.

It does a very specific job; ensure the blocks you write to a drbd device get replicated and stored in real time on one or more other distributed hosts. If you write a 512byte size block of random garbage to a drbd device it will (and should) write the exact same garbage to the other distributed hosts too, so that if you read that same 512byte block back from any 1 of those individual hosts, you'll get the exact same garbage back.

The OP stated "if the filesystem gets corrupted, then DRBD faithfully replicates the corruption to the other node." Good! That's exactly what we want it to do. What we definitely do NOT want is for drbd to manipulate the block data given to it in any way whatsoever, we want it to faithfully replicate this.
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
[email protected]
https://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to