On Sat, May 26, 2012 at 5:56 AM, Lars Marowsky-Bree <l...@suse.com> wrote: > On 2012-05-25T21:44:25, Florian Haas <flor...@hastexo.com> wrote: > >> > If so, the master thread will not self-fence even if the majority of >> > devices is currently unavailable. >> > >> > That's it, nothing more. Does that help? >> >> It does. One naive question: what's the rationale of tying in with >> Pacemaker's view of things? Couldn't you just consume the quorum and >> membership information from Corosync alone? > > Yes and no. > > On SLE HA 11 (which, alas, is still the prime motivator for this), > corosync actually gets that state from Pacemaker. And, ultimately, it is > Pacemaker's belief (from the CIB) that pengine bases its fencing > decisions on, so that's where we need to look. > > Further, quorum isn't enough. If we have quorum, the local node could > still be dirty (as in: stop failures, unclean, ...) that imply that it > should self-fence, pronto. > > Since this overrides the decision to self-fence if the devices are gone, > and thus a real poison pill may no longer be delivered, we must take > steps to minimize that risk. > > But yes, what it does now is to sign in both with corosync/ais and > the CIB, querying quorum state from both. > > Fun anecdote, I originally thought being notification-driven might be > good enough - until the testers started SIGSTOPping corosync/cib and > complaining that the pacemaker watcher didn't pick up on that ;-) > > I know this is bound to have some holes. It can't perform a > comprehensive health check of pacemaker's stack; yet, this only matters > for as long as the loss of devices persists. During that degraded phase, > the system is a bit more fragile. I'm a bit weary of this, because I'm > *sure* these will all get reported one after another and further > contribute to the code obfuscation, but such is reality ... > >> > (I have opinions on particularly the last failure mode. This seems to >> > arise specifically when customers have build setups with two HBAs, two >> > SANs, two storages, but then cross-linked the SANs, connected the HBAs >> > to each, and the storages too. That seems to frequently lead to >> > hiccups where the *entire* fabric is affected. I'm thinking this >> > cross-linking is a case of sham redundancy; it *looks* as if makes >> > things more redundant, but in reality reduces it since faults are no >> > longer independent. Alas, they've not wanted to change that.) >> >> Henceforth, I'm going to dangle this thread in front of everyone who >> believes their SAN can never fail. Thanks. :) > > Heh. Please dangle it in front of them and explain the benefits of > separation/isolation to them. ;-) > > If they followed our recommendation - 2 independent SANs, and a third > iSCSI device over the network (okok, effectively that makes 3 SANs) - > they'd never experience this. > > (Since that's how my lab is actually set up, I had some troubles > following the problems they reported initially. Oh, and *don't* get me > started on async IO handling in Linux.) > >> Are there any SUSEisms in SBD or would you expect it to be packageable >> on any platform? > > Should be packageable on every platform, though I admit that I've not > tried building the pacemaker module against anything but the > corosync+pacemaker+openais stuff we ship on SLE HA 11 so far. > > I assume that this may need further work; at least the places I stole > code from had special treatment. And the source code to crm_node > (ccm_epoche.c) ... I *think* this may indicate opportunities for > improving the client libraries in pacemaker to hide all that stuff > better.
Yep, suggestions are welcome. In theory it shouldn't be required, but in practice there are so many membership/quorum combinations that sadly the compatibility code has become worthy of a real API. _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/