I would second to Stanley - client hosts
fail-over/load-balancing would be straightforward
For linux host you have several tools: sw raid, lvm or
dm-multipath
-vu
I had looked at this configuration as well and decided to use the volume
management at the clients to mirror the data. Windows LDM mirrored
across 2 SRPT servers and Linux md RAID 1 mirrored.
This provides transparent failover and the SRP client/host will rebuild
the slices that went offline.
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Daniel Pocock
Sent: Tuesday, March 11, 2008 4:26 PM
To: [email protected]
Subject: [ofa-general] SRP target/LVM HA configuration
I'm contemplating a HA configuration based on SRP and LVM (or
maybe EVMS).
There are many good resources based on NFS and drbd (see
http://www.linux-ha.org/HaNFS) but it would be more flexible to work
with block level (e.g SRP) rather than file level (NFS). Obviously,
SRP/RDMA offers a major performance benefit compared with drbd (which
uses IP).
Basically, I envisage the primary server having access to the
secondary
(passive) server's disk using SRP, and putting both the local
(primary)
disk and SRP (secondary) disk into RAID1. The RAID1 set
would contain a
volume group and multiple volumes - which would, in turn, be
SRP targets
(for VMware to use) or possibly NFS shares.
This leads me to a few issues:
- Read operations - would it be better for the primary to
read from both
disks, or just it's own disk? Using drbd, the secondary disk is not
read unless the primary is down. However, given the
performance of SRP,
I suspect that reading from both the local and SRP disk would give a
boost to performance.
- Does it make sense to use md or LVM to combine a local disk
and an SRP
disk into RAID1 (or potentially RAID5)? Are there technical
challenges
there, given that one target is slightly faster than the other?
- Fail-over - when the secondary detects that the primary is
down, can
it dynamically take the place of the failed SRP target? Will the
end-user initiators (e.g. VMWare, see diagram below) be confused when
the changeover occurs? Is there the possibility of data
inconsistency
if some write operations had been acknowledged by the primary but not
propagated to the secondary's disk at the moment when the
failure occurred?
- Recovery - when the old primary comes back online as a
secondary, it
will need to resync it's disk - is a partial resync possible,
or is full
rebuild mandatory?
Diagram:
Disk--Primary Server-------------------SRP Initiator (e.g. VMware ESX)
| +------NFS client
| .
SRP .
(RAID1 of primary's .
disk and secondary's .
disk) . (fail-over path to storage
| . when primary is down)
Disk--Secondary Server. . . . . .
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general