On 2021-06-03 3:35 p.m., Eric Robinson wrote: >> Even this approach just moves the SPOF up from the FS to the SQL engine. >> >> The problem here is that you're still confusing redundancy with data >> integrity. To avoid data corruption, you need a layer that understands your >> data at a sufficient level to know what corruption looks like. Data >> integrity is >> yet another topic, and still separate from HA. >> > >> DRBD, and other HA tools, don't analyze the data, and nor should they >> (imagine the security and privacy concerns that would open up). If the HA >> layer is given data to replicate, it's job is to faithfully and accurately >> replicate >> the data. >> > > It seems like the two are sometimes intertwined. If GFS2, for example, about > integrity or redundancy? But I'm not really asking how to prevent filesystem > corruption. I'm asking (perhaps stupidly) the best/easiest way to make a > filesystem redundant.
GFS2 coordinates access between nodes, to ensure no two step on each others blocks and that all know when to update their view of the FS. It is still above the redundancy layer, it is still just a file system at the end of the day. If, for example, you were writing data to an FS on top of DRBD, and one of the node's local storage started failing, the kernel would (should) inform the DRBD driver that there has been an IO error. In such a case, the DRBD device should detach from the local store and go diskless. All further read/writes on that node would (transparently) go to/from another node. In this way, I think, you get as close to the goal you're describing. In such a case though, you survived a hardware failure, _exactly_ what HA is all about. You would have no data loss and your managers would be happy. However, note how this example was below the data structure... It involved the detection of a hardware fault and mitigation of that fault. DRBD (like a RAID array) has no concept of data structures. So if something at the logic layer wrote bad data (ie: a user's deletion or saving of bad data), DRBD (again, like a RAID array) only cares to ensure that the data is on both/all nodes, byte for byte accurate. This is where the role of HA ends, and the role of anti-virus, security and data integrity / backups kick in. >> I think the real solution is not technical, it's expectations management. >> Your >> managers need to understand what each part of their infrastructure does >> and does not do. This way, if the concerns around data corruption are >> sufficient, they can invest in tools to protect the data integrity at the >> logical >> layer. >> >> HA protects against component failure. That's it's job, and it does it well, >> when well implemented. >> > > The filesystem is not a hardware component, but it is a cluster resource. The > other cluster resources are redundant, with that sole exception. I'm just > looking for a way around that problem. If there isn't one, then there isn't. Consider the example of a virtual machine running on top of DRBD / pacemaker (a setup I am very familiar with). If the host hardware fails, the VM can be preventatively migrated or recovered on the peer node. In this way, the data was preserved (up to the point of failure / reboot), and services are restored promptly. This was possible because, byte for byte the data was written to both host nodes. Voila! Full protection against hardware faults. Consider now that your VM gets hit with a cryptolocker virus. That attack is, faithfully, replicated to both nodes (exactly as it would replicate to both hard drives in a RAID 1 array). In this case, you're out of luck. Why? Because HA doesn't protect data integrity, it can't. It's role is to protect against hardware faults. This is true of the filesystem inside a VM, or a file system directly on top of a DRBD resource. The key take-away here is the role of different technologies in your over-all corporate resilience planning. It's one (very powerful) tool in a toolbox to protect your services and data. Backups, DR and anti-malware all play each their own roles in the big-picture planning. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list [email protected] https://lists.linbit.com/mailman/listinfo/drbd-user
