[ha-clusters-discuss] How to setup up highly available zfs file system replicated synchronously across two sites?

Anton Altaparmakov Thu, 11 Mar 2010 16:03:36 +0000

Hi,

Thank you for the quick reply but I am afraid I didn't express myself well 
enough...


On 11 Mar 2010, at 14:43, Fredrich Maney wrote:
> In order for a filesystem (any filesystem on any OS) to failover
> between nodes, that filesystem needs to be on shared storage that is
> external to all nodes. This is because if the node that hosts the
> storage fails, i.e. has a system board failure, there is no way for
> the other node to see it.

No it doesn't...  That is not what we do on Linux.  The storage is replicated 
on each node.

> You are already doing this in your working example on Linux - the
> iSCSI LUNs are presented to both nodes in the cluster from whatever
> device is hosting the iSCSI LUNs.

No.  Each node IS the storage in our setup.  Here is what we exactly have with 
Linux:

LVM provided blockdevice on node1 and LVM provided blockdevice on node2.

When node1 is master we have:

- node2 exports the blockdevice via iscsi_target 
- node1 imports the blockdevice from node1 via open-iscsi
- node1 runs Linux software RAID (MD) in synchronous mirror mode between the 
local blockdevice and the from node2 iscsi imported blockdevice
- node1 mounts the software RAID MD device using XFS
- node1 runs NFS server exporting XFS file system
- node1 has service IP address

When node1 fails (or we ask heartbeat to move the service to the other node), 
we:

- stop using the IP address on node1
- shut down NFS server on node1
- unmount XFS file system on node1
- stop the RAID device on node1
- stop importing the iscsi device on node1
- node2 stops exporting the blockdevice using issi_target

And then we do as above in reverse, i.e.

- node1 exports the blockdevice via iscsi_target 
- node2 imports the blockdevice from node2 via open-iscsi
- node2 runs Linux software RAID (MD) in synchronous mirror mode between the 
local blockdevice and the from node1 iscsi imported blockdevice
- node2 mounts the software RAID MD device using XFS
- node2 runs NFS server exporting XFS file system
- node2 has service IP address

And all this happens within a matter of seconds so that the NFS connections do 
not even notice the interruption at all.  You just get a brief pause on the NFS 
clients and then they carry on as before without even knowing that they are now 
talking to a completely different server.

> You just need to do the same thing thing on the Solaris side. However,
> remember that ZFS is not multi-initiator aware, so you can not mount
> the zpools on both nodes at once without disk corruption. You will
> probably want to wrap the service, ip and storage in a zone and fail
> that over all together instead of separately at the global zone level.
> 
> Google is your friend. I'd suggest searching for "Solaris Cluster iSCSI zone".

I would but that is not what we want to do at all...

Trust me I just spent close to two weeks trying to get this to work and I have 
read all Sun documentation that seemed relevant and all that google found that 
seemed relevant but I am hoping I have missed something obvious because I can't 
see how to do it...

Best regards,

        Anton

> fpsm
> 
> On Thu, Mar 11, 2010 at 9:18 AM, Anton Altaparmakov <aia21 at cam.ac.uk> 
> wrote:
>> Hi,
>> 
>> I have been trying to setup Solaris Storage AVS with Sun Cluster in the hope 
>> of having a ZFS file system replicated synchronously (via TCP/IP only) 
>> between two machines so that it is mounted on one machine read-write and if 
>> that machine fails it is mounted read-write on the other machine.
>> 
>> I have been reading all sorts of documentation and man pages and 
>> experimenting but everything I have tried immediately asks for configuration 
>> of shared storage which we don't have as the two machines are only connected 
>> by TCP/IP.
>> 
>> We have such a system running at the moment using Linux, iSCSI plus software 
>> raid for the replication and XFS as the file system and heartbeat v2 for the 
>> failover and that works well.  We then have an NFS server which exports the 
>> XFS file system and the NFS server is migrated together with the service ip 
>> address and the XFS file system between the two nodes in the heartbeat 
>> cluster but I have now spent ages trying to figure out what to do with Sun 
>> Cluster and AVS to achieve the same and I am completely failing to do it.  
>> )-:
>> 
>> Would someone, pretty please with sugar on top, point me at the 
>> documentation I am failing to find or alternatively giving me some pointers 
>> as to which commands it is I should be using?
>> 
>> Thank you very much in advance!
>> 
>> Best regards,
>> 
>>        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/

[ha-clusters-discuss] How to setup up highly available zfs file system replicated synchronously across two sites?

Reply via email to