I had looked at this configuration as well and decided to use the volume management at the clients to mirror the data. Windows LDM mirrored across 2 SRPT servers and Linux md RAID 1 mirrored.
This provides transparent failover and the SRP client/host will rebuild the slices that went offline. > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Daniel Pocock > Sent: Tuesday, March 11, 2008 4:26 PM > To: [email protected] > Subject: [ofa-general] SRP target/LVM HA configuration > > > > > > > I'm contemplating a HA configuration based on SRP and LVM (or > maybe EVMS). > > There are many good resources based on NFS and drbd (see > http://www.linux-ha.org/HaNFS) but it would be more flexible to work > with block level (e.g SRP) rather than file level (NFS). Obviously, > SRP/RDMA offers a major performance benefit compared with drbd (which > uses IP). > > Basically, I envisage the primary server having access to the > secondary > (passive) server's disk using SRP, and putting both the local > (primary) > disk and SRP (secondary) disk into RAID1. The RAID1 set > would contain a > volume group and multiple volumes - which would, in turn, be > SRP targets > (for VMware to use) or possibly NFS shares. > > This leads me to a few issues: > > - Read operations - would it be better for the primary to > read from both > disks, or just it's own disk? Using drbd, the secondary disk is not > read unless the primary is down. However, given the > performance of SRP, > I suspect that reading from both the local and SRP disk would give a > boost to performance. > > - Does it make sense to use md or LVM to combine a local disk > and an SRP > disk into RAID1 (or potentially RAID5)? Are there technical > challenges > there, given that one target is slightly faster than the other? > > - Fail-over - when the secondary detects that the primary is > down, can > it dynamically take the place of the failed SRP target? Will the > end-user initiators (e.g. VMWare, see diagram below) be confused when > the changeover occurs? Is there the possibility of data > inconsistency > if some write operations had been acknowledged by the primary but not > propagated to the secondary's disk at the moment when the > failure occurred? > > - Recovery - when the old primary comes back online as a > secondary, it > will need to resync it's disk - is a partial resync possible, > or is full > rebuild mandatory? > > > Diagram: > > > Disk--Primary Server-------------------SRP Initiator (e.g. VMware ESX) > | +------NFS client > | . > SRP . > (RAID1 of primary's . > disk and secondary's . > disk) . (fail-over path to storage > | . when primary is down) > Disk--Secondary Server. . . . . . > > > > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
