I guess I need to reiterate that I’ve been using DRBD in production clusters 
since 2006 and have been extremely satisfied happy with it. The purpose of my 
question is not to cast doubt or blame on DRBD for doing its job well. It's a 
good thing that DRBD faithfully replicates whatever is passed to it. However, 
since that is true, it does tend to enable the problem of filesystem corruption 
taking down a whole cluster. I'm just asking people for any suggestions they 
may have for alleviating that problem. If it’s not fixable, then it’s not 
fixable.



Part of the reason I’m asking is because we’re about to build a whole new data 
center, and after 15 years of using DRBD we are beginning to look at other HA 
options, mainly because of the filesystem as a weak point. I should mention 
that it has *never* happened before, but the thought of it is scary.



-Eric




From: [email protected] <[email protected]> 
On Behalf Of Yanni M.
Sent: Thursday, June 3, 2021 2:21 PM
Cc: [email protected]
Subject: Re: [DRBD-user] The Problem of File System Corruption w/DRBD

As others already mentioned the job of DRBD is to faithfully and accurately 
replicate the data from the layers above it. So if there's a corruption on the 
filesystem above the DRBD layer then it will happily do it for you, same way as 
RAID1  would do it on a pair of hdds. If you want to reduce the recovery time 
from such situation then you could leverage from the snapshots capability on 
the layers below DRBD (if ThinLVM or ZFS are used), to rollback at a previous 
checkpoint or implement HA at the layers above DRBD if the application you are 
using supports it, it really depends on the use case. That being said a 
filesystem corruption shouldn't be a common thing and if it occurs you should 
investigate why it happened in the first place.



On Wed, 2 Jun 2021 at 22:50, Eric Robinson 
<[email protected]<mailto:[email protected]>> wrote:
Since DRBD lives below the filesystem, if the filesystem gets corrupted, then 
DRBD faithfully replicates the corruption to the other node. Thus the 
filesystem is the SPOF in an otherwise shared-nothing architecture. What is the 
recommended way (if there is one) to avoid the filesystem SPOF problem when 
clusters are based on DRBD?

-Eric




Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
[email protected]<mailto:[email protected]>
https://lists.linbit.com/mailman/listinfo/drbd-user
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
[email protected]
https://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to