Re: [DRBD-user] split-brain on Ubuntu 14.04 LTS after reboot of master node

Ivan Sun, 15 Nov 2015 11:08:12 -0800


On 11/15/2015 07:53 PM, Digimer wrote:

On 15/11/15 11:36 AM, Ivan wrote:



On 11/15/2015 05:04 PM, Digimer wrote:

On 15/11/15 05:03 AM, Waldemar Brodkorb wrote:

Hi,
Digimer wrote,

property $id="cib-bootstrap-options" \
          dc-version="1.1.10-42f2063" \
          cluster-infrastructure="corosync" \
          stonith-enabled="false" \


And here's the core of the problem.

Configure and test stonith in pacemaker. Then, configure drbd to use
'fencing resource-and-stonith;' and configure
'crm-{un,}fence-peer.sh as
the {un,}fence handlers.


So stonith is a hard requirement even when a simple reboot is done?

The docs is mentioning following:
"The ocf:linbit:drbd OCF resource agent provides Master/Slave
capability, allowing Pacemaker to start and monitor the DRBD
resource on multiple nodes and promoting and demoting as needed. You
must, however, understand that the drbd RA disconnects and detaches
all DRBD resources it manages on Pacemaker shutdown, and also upon
enabling standby mode for a node."

http://drbd.linbit.com/users-guide-8.4/s-pacemaker-crm-drbd-backed-service.html


So why demoting does not work when a reboot is done?
When I do a simple crm node standby; sleep 30; crm node online
everything is fine.

best regards
   Waldemar


It's a hard requirement, period. Without it, debugging problems is a
waste of time because the cluster enters an undefined state. Fix
stonith, see if the issue remains, and if so, let us know.


You're right that fencing should be set up for production clusters (in
the sense that you take a huge data consistency risk not setting it up)
but last time I did a test environment without stonith I could reboot a
node without getting a pacemaker split-brain. Either things have changed
from back then, or the OP is hitting another problem; maybe the reboot
doesn't properly shut down pacemaker, or the network (link, firewall,
...) is torn down before pacemaker is stopped, ...

cheers
ivan


It is entirely possible the issue is not related to fencing being
disabled. My point is that, without fencing, debugging becomes
sufficiently more complicated that it's not worth it. With fencing,
problems become much easier to find, in my experience.

Also, you need fencing in all cases anyway, so why not use it from the
start? A "test cluster" that doesn't match what will become production
has limited use, wouldn't you agree?

I fully agree. But a proper cluster setup is complicated enough thatmost people want to set up things step by step, and understand each stepbefore moving on to the next one. In that case, stonith is a veryimportant feature, but that's it, a feature, not a hard requirement,hence the existence of the stonith-enabled parameter. For instance theguys at clusterlabs disable stonith in their getting started doc [1]although there's a big warning explaining why.


Quoting the fine documentation:

"stonith-enable=false tells the cluster to simply pretend that failednodes are safely powered off".

So if the OP is sure that its node has rebooted and doesn't access theshared storage (if any), then there must be a bug or aconfiguration/setup problem that fencing will just paper over when it'llkill the node. If enabling fencing solves the problem, that would be abug too IMO. That said, you have way more experience than me, and maybefencing will help in finding the cause of the problem.

[1]http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/ch05.html

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] split-brain on Ubuntu 14.04 LTS after reboot of master node

Reply via email to