Dear Joe, Dante, Apologies in advance about not replying inline to your comments.
I am getting the impression here that DRBD is being considered as a "remote" mirroring solution which makes it seems like the secondary oss housing the backup OST is sitting far far away making it unreliable or inefficient. Side note: DRBD+ does have the provision of allowing mirroring of data to a third node which will replicate asynchronously (quite customizable really). One can configure independent network routes for DRBD replication, which are synchronous btw, and with heartbeat in the picture and a NPS accounted for, the overall deployment can absolutely have a very reliable, highly available and robust architecture coupling the various technologies being discussed. Our company uses a small Lustre cluster in the above configuration whereas two of our clients (both financial houses) have similar clustered solutions, which admittedly are small (3 TBs approximately serving a no more than 20 clients), catering to core applications. DRBD / local storage / HA and Lustre would require a bit of know-how to put together, however, if cost is an issue (or even sometimes when it's not), it's absolutely worth a look into. We've been running happily for months now -- with many many fail-overs :) mustafa. On Dec 6, 2007 5:19 PM, Fegan, Joe <[EMAIL PROTECTED]> wrote: > D. Dante Lorenso wrote: > > > Is it possible to configure Lustre to write Objects to more than 1 node > > simultaneously such that I am guaranteed that if one node goes down that > > all files are still accessible? > > As Brian Murrell said earlier, if the data for a certain OST or MDS is > visible to only one node then you will lose access to that data when that > node is down. Continuous replication of the data is one approach, but > commercial Lustre implementations today typically use shared storage hardware > instead. > > HP's Lustre-based product (SFS) for example, places all Lustre data on shared > disks and uses clustering software to nominate one node as the primary for > each Lustre service and another node as the backup. We configure the server > nodes in pairs for redundancy; node A is the primary server for OST1 and > secondary for OST2, node B is primary for OST2 and secondary for OST1. This > means that as long as either A or B is up clients will have access to both > OST1 and OST2. This sounds like the sort of configuration you are looking > for. To make it work you absolutely need both A and B to be able to see the > data for both OST1 and OST2, though only one of them will be serving each OST > at a given time of course (if both nodes try to serve the same OST at the > same time the underlying ext3 filesystem will get corrupted so fast it'll > make your head spin). > > > It is a delicate mounting/unmounting game to ensure that partitions are > > monitored, mounted, and fail-over in just the right order. > > Absolutely right, this is the hard bit. > > I have no personal experience of DRDB but from their website I see that it's > remote disk mirroring software that works by sending notifications of all > changes to a local disk to a remote node. The remote node makes the same > changes to one of its local disks, making that disk a sort of remote mirror > of the one on the original node. Like long distance RAID1. You could also > think of it as a shared storage emulator in software and with that in mind > you can see where it would fit into the architecture I outlined above. > > Having said that, I'm not aware of anyone using DRDB in a Lustre environment, > so can't comment on how well it works. Maybe others on this list have > experience with it and can comment better. I'd be a bit concerned about the > timeliness of updates to the remote mirror, whether the latency would cause > problems after a failover (though DRDB does support ext3 and these are ext3 > filesystems under the hood, albeit heavily modified). I'd also wonder about > performance with change notifications for every write being sent over > ethernet to the other node, though I'm sure you've thought about that aspect > already. _______________________________________________ Lustre-discuss mailing list Lustre-discuss@clusterfs.com https://mail.clusterfs.com/mailman/listinfo/lustre-discuss