I have been focusing my efforts the past week to learn about stonith
and fencing. First I needed to setup and configure my ipmi devices
that stonith will use. That works and stonith is now fat and happy.
Thanks to those that told me about the undocumented -d option for
stonith that unveiled the problem ... ipmitool was not part of the
base CentOS distro so I had to build it from source.
So now I am walking through my ha.cf with "crm off" (yes I want to get
this working in version 1 then convert my haresources to cib format
afterwards).
I have been reading about situations where the primary switchover
fails because the DRBD parameters are not in tune with those of
heartbeat.
So that lead me back to my drbd.conf file which I tweaked as I was
following an earlier tutorial. It has all sorts of options in it that
I feel may conflict
with what heartbeat is trying to do.
This may be better suited for the DRBD list but since it involves both
I decided to use this list. Please read the handlers and comment on
my assumptions. Yes this is an exercise in learning.
1) There are a number of handlers that can be used in drbd, here are a few:
# what should be done in case the node is primary, degraded (=no
connection) and has inconsistent data.
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
This will forcibly halt the local node. Since heartbeat is in control
I would assume that stonith should be the only thing doing the
powering down. Do I simply comment out this handler?
2) # The node is currently primary, but lost the after split brain
auto recovery procedure. As as consequence it should go away.
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
This again will forcibly halt the local node. I know the value above
should not be there but I am not sure what to put there in its place
if anything.
3) # Commands to run in case we need to downgrade the peer's disk
state to "Outdated". Should be implemented by the superior
# communication possibilities of our cluster manager. Update: Now
there is a solution that relies on heartbeat's
# communication layers. You should really use this.
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
Okay so I should use heartbeats communication layers according to the
comment. Then does that mean I simply comment out this handler?
4) # The node is currently primary, but should become sync target
after the negotiating phase. Alert someone about this incident.
pri-lost "echo pri-lost. Have a look at the log files. | mail -s
'DRBD Alert' [EMAIL PROTECTED]";
This just tells me that this node was primary and now is in secondary.
Before it becomes primary again it needs to be the target
of a sync. Here I simply send an email notifying me that this happened.
What I do not understand is how does the sync materialize? I assume
that this is always manual task?
5) # Notify someone in case DRBD split brained.
split-brain "echo split-brain. drbdadm -- --discard-my-data
connect $DRBD_RESOURCE ?" | mail -s 'DRBD Alert' [EMAIL PROTECTED];
So here I am sending an email to myself that gives me instructions on
how to come out of split brain
That is enough for now. I feel like I have an understanding then I
read something else and I feel like I am having a split brain =(
regards,
Douglas Lochart
--
What profits a man if he gains the whole world yet loses his soul?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems