On 12/17/2013 03:52 PM, Sten Wolf wrote: > I have 2 more questions: > > 1. Is dual-mgs supported with zfs? My issue seems to be mgs and mdt on > same node, when mgs is configured for 2 nodes > 2. Which is recommended? ldiskfs w/ 2x mdt, or zfs w/ single mdt? > > I assumed the llnl seqouia implementation used zfs w/ HA (dual mgs/dual > mdt active/passive) but I might be wrong on that account.
Yes the MDS/MGS is a single node using zfs, but we do not employ failover for the MDS/MGS. We have always found that Lustre software failures on the MDS/MGS are many, many times more common than hardware failures. When the MDS/MGS crashes from a Lustre bug, we want to take the extra time to complete a full kernel crash dump so we have a chance of debugging the problem. Since we need to spend that extra time on the crash dump there is little advantage to moving the service to a failover partner; we just allow the current node to reboot. Also important to understand is that we do not yet have multi-mount protection (MMP) in ZFS, so you need to take great care with your HA solution. You need an extremely reliable STONITH. If your power control is unreliable, you can easily wind up with multiple nodes using the same storage pool at the same time. That would be very bad. That said, we do employ failover for our OSS nodes. Our power control for the OSS nodes was not as reliable as needed, so we added extra checks in our HA scripts to double check whether STONITH really worked and retry the power-off command as necessary. We will probably reexamine our stance on MDS failover once DNE2 is complete and stable. When there are multiple active MDS nodes, why not? Then again, unless the software becomes a great deal more stable, we'll still be dependent on those slow crash dumps that we would not want to interrupt with a STONITH. Also is not particularly uncommon for a software bug to result in a continuous crash-reboot loop. Having HA in that case would just spread the problem to the failover partner node. For now we are sticking with simple. Chris _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
