On Mon, Mar 31, 2014 at 11:03 AM, Richard Pieri <[email protected]>wrote:
> Bill Ricker wrote: > >> I've seen a big-name commercial block-replication solution duplicate >> trashed data to the cold spare ... wasn't pretty ! >> > > Another great example of how replication is not backup. Exactly. Extra copies of blocks in the local SAN or remote SAN don't help if App or Block device driver or Multipath software mangles the bits somehow prior to all the copying. It was actual backups, restored to a non-replicated test system, that got those users on-line again. (FWIW, that was not at my last shop, but a related firm running the same application. *Our* copy of the app used transaction-replication, not block replication, for 2nd site disaster recovery only. HA for ours was heartbeat-triggered restart on 2nd local node, pulling vDisks with multipath SAN. The SAN controller served as the 3rd party to avoid split brain; 2nd node could successfully request vDisk reassignment only if controller recognized primary was disconnected. Had extra redundancy option in SAN too, which might have been more trouble than it was worth. ) (Split-brain is why i've avoided remote auto-restart. If you need distributed HA, you need to architect for hot-hot distributed load-balancing -- not easily retrofitted to monolithic legacy apps!) My two cents, I saw more failures from Multipath software's interaction with other software exposing inadequately tested edge cases in the whole stack than i saw failures averted by Multipath. -- Bill @n1vux [email protected] _______________________________________________ Discuss mailing list [email protected] http://lists.blu.org/mailman/listinfo/discuss
