Dear Joe, Dante,

Apologies in advance about not replying inline to your comments.

I am getting the impression here that DRBD is being considered as a
"remote" mirroring solution which makes it seems like the secondary
oss housing the backup OST is sitting far far away making it
unreliable or inefficient. Side note: DRBD+ does have the provision of
allowing mirroring of data to a third node which will replicate
asynchronously (quite customizable really).

One can configure independent network routes for DRBD replication,
which are synchronous btw, and with heartbeat in the picture and a NPS
accounted for, the overall deployment can absolutely have a very
reliable, highly available and robust architecture coupling the
various technologies being discussed.

Our company uses a small Lustre cluster in the above configuration
whereas two of our clients (both financial houses) have similar
clustered solutions, which admittedly are small (3 TBs approximately
serving a no more than 20 clients), catering to core applications.

DRBD / local storage / HA and Lustre would require a bit of know-how
to put together, however,  if cost is an issue (or even sometimes when
it's not), it's absolutely worth a look into. We've been running
happily for months now -- with many many fail-overs :)

mustafa.

On Dec 6, 2007 5:19 PM, Fegan, Joe <[EMAIL PROTECTED]> wrote:
> D. Dante Lorenso wrote:
>
> > Is it possible to configure Lustre to write Objects to more than 1 node
> > simultaneously such that I am guaranteed that if one node goes down that
> > all files are still accessible?
>
> As Brian Murrell said earlier, if the data for a certain OST or MDS is 
> visible to only one node then you will lose access to that data when that 
> node is down. Continuous replication of the data is one approach, but 
> commercial Lustre implementations today typically use shared storage hardware 
> instead.
>
> HP's Lustre-based product (SFS) for example, places all Lustre data on shared 
> disks and uses clustering software to nominate one node as the primary for 
> each Lustre service and another node as the backup. We configure the server 
> nodes in pairs for redundancy; node A is the primary server for OST1 and 
> secondary for OST2, node B is primary for OST2 and secondary for OST1. This 
> means that as long as either A or B is up clients will have access to both 
> OST1 and OST2. This sounds like the sort of configuration you are looking 
> for. To make it work you absolutely need both A and B to be able to see the 
> data for both OST1 and OST2, though only one of them will be serving each OST 
> at a given time of course (if both nodes try to serve the same OST at the 
> same time the underlying ext3 filesystem will get corrupted so fast it'll 
> make your head spin).
>
> > It is a delicate mounting/unmounting game to ensure that partitions are
> > monitored, mounted, and fail-over in just the right order.
>
> Absolutely right, this is the hard bit.
>
> I have no personal experience of DRDB but from their website I see that it's 
> remote disk mirroring software that works by sending notifications of all 
> changes to a local disk to a remote node. The remote node makes the same 
> changes to one of its local disks, making that disk a sort of remote mirror 
> of the one on the original node. Like long distance RAID1. You could also 
> think of it as a shared storage emulator in software and with that in mind 
> you can see where it would fit into the architecture I outlined above.
>
> Having said that, I'm not aware of anyone using DRDB in a Lustre environment, 
> so can't comment on how well it works. Maybe others on this list have 
> experience with it and can comment better. I'd be a bit concerned about the 
> timeliness of updates to the remote mirror, whether the latency would cause 
> problems after a failover (though DRDB does support ext3 and these are ext3 
> filesystems under the hood, albeit heavily modified). I'd also wonder about 
> performance with change notifications for every write being sent over 
> ethernet to the other node, though I'm sure you've thought about that aspect 
> already.

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@clusterfs.com
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to