So, if I am understanding you right, what you really have is 2 copies of the filesystem - one on each node. The copy on each node is actually a mirror comprised of one local LUN and one remote LUN.
Something like this ASCII drawing: node0 node1 | \ / | | \ / | | X | | / \ | | / \ | mirror0 mirror1 | \ / | | \ / | | X | | / \ | | / \ | lun0 lun1 You are effectively doing the same thing, albeit in a more fault prone manner, as what I suggested - just at the LVM level instead of at the hardware level (dual, direct attached disk or SAN attached disk). I think you are going to run into problems here, particularly with ZFS since it is not multi-initiator aware and can not be present on more than 1 node at a time. I suspect that to do this, you are going to need to follow Steve Mckinty's advice and look at Sun Cluster Geographic Edition. I believe Tim Read wrote a Blueprint up about just this sort of scenario. fpsm On Thu, Mar 11, 2010 at 11:03 AM, Anton Altaparmakov <aia21 at cam.ac.uk> wrote: > Hi, > > Thank you for the quick reply but I am afraid I didn't express myself well > enough... > > On 11 Mar 2010, at 14:43, Fredrich Maney wrote: >> In order for a filesystem (any filesystem on any OS) to failover >> between nodes, that filesystem needs to be on shared storage that is >> external to all nodes. This is because if the node that hosts the >> storage fails, i.e. has a system board failure, there is no way for >> the other node to see it. > > No it doesn't... ?That is not what we do on Linux. ?The storage is replicated > on each node. > >> You are already doing this in your working example on Linux - the >> iSCSI LUNs are presented to both nodes in the cluster from whatever >> device is hosting the iSCSI LUNs. > > No. ?Each node IS the storage in our setup. ?Here is what we exactly have > with Linux: > > LVM provided blockdevice on node1 and LVM provided blockdevice on node2. > > When node1 is master we have: > > - node2 exports the blockdevice via iscsi_target > - node1 imports the blockdevice from node1 via open-iscsi > - node1 runs Linux software RAID (MD) in synchronous mirror mode between the > local blockdevice and the from node2 iscsi imported blockdevice > - node1 mounts the software RAID MD device using XFS > - node1 runs NFS server exporting XFS file system > - node1 has service IP address > > When node1 fails (or we ask heartbeat to move the service to the other node), > we: > > - stop using the IP address on node1 > - shut down NFS server on node1 > - unmount XFS file system on node1 > - stop the RAID device on node1 > - stop importing the iscsi device on node1 > - node2 stops exporting the blockdevice using issi_target > > And then we do as above in reverse, i.e. > > - node1 exports the blockdevice via iscsi_target > - node2 imports the blockdevice from node2 via open-iscsi > - node2 runs Linux software RAID (MD) in synchronous mirror mode between the > local blockdevice and the from node1 iscsi imported blockdevice > - node2 mounts the software RAID MD device using XFS > - node2 runs NFS server exporting XFS file system > - node2 has service IP address > > And all this happens within a matter of seconds so that the NFS connections > do not even notice the interruption at all. ?You just get a brief pause on > the NFS clients and then they carry on as before without even knowing that > they are now talking to a completely different server. > >> You just need to do the same thing thing on the Solaris side. However, >> remember that ZFS is not multi-initiator aware, so you can not mount >> the zpools on both nodes at once without disk corruption. You will >> probably want to wrap the service, ip and storage in a zone and fail >> that over all together instead of separately at the global zone level. >> >> Google is your friend. I'd suggest searching for "Solaris Cluster iSCSI >> zone". > > I would but that is not what we want to do at all... > > Trust me I just spent close to two weeks trying to get this to work and I > have read all Sun documentation that seemed relevant and all that google > found that seemed relevant but I am hoping I have missed something obvious > because I can't see how to do it... > > Best regards, > > ? ? ? ?Anton > >> fpsm >> >> On Thu, Mar 11, 2010 at 9:18 AM, Anton Altaparmakov <aia21 at cam.ac.uk> >> wrote: >>> Hi, >>> >>> I have been trying to setup Solaris Storage AVS with Sun Cluster in the >>> hope of having a ZFS file system replicated synchronously (via TCP/IP only) >>> between two machines so that it is mounted on one machine read-write and if >>> that machine fails it is mounted read-write on the other machine. >>> >>> I have been reading all sorts of documentation and man pages and >>> experimenting but everything I have tried immediately asks for >>> configuration of shared storage which we don't have as the two machines are >>> only connected by TCP/IP. >>> >>> We have such a system running at the moment using Linux, iSCSI plus >>> software raid for the replication and XFS as the file system and heartbeat >>> v2 for the failover and that works well. ?We then have an NFS server which >>> exports the XFS file system and the NFS server is migrated together with >>> the service ip address and the XFS file system between the two nodes in the >>> heartbeat cluster but I have now spent ages trying to figure out what to do >>> with Sun Cluster and AVS to achieve the same and I am completely failing to >>> do it. ?)-: >>> >>> Would someone, pretty please with sugar on top, point me at the >>> documentation I am failing to find or alternatively giving me some pointers >>> as to which commands it is I should be using? >>> >>> Thank you very much in advance! >>> >>> Best regards, >>> >>> ? ? ? ?Anton > -- > Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) > Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK > Linux NTFS maintainer, http://www.linux-ntfs.org/ > >