Re: [Linux-HA] hb_report: trouble on "simple" 2-node active/passive cluster with heartbeat 2.1.3 and CRM

Wolfram Schlich Thu, 14 Feb 2008 12:21:47 -0800

* Wolfram Schlich <[EMAIL PROTECTED]> [2008-02-14 19:27]:
> * Wolfram Schlich <[EMAIL PROTECTED]> [2008-02-14 16:22]:
> > Looked fine. Then I ran "killall -9 heartbeat ccm cib lrmd stonithd attrd 
> > crmd
> > tengine pengine cibmon dopd pingd" on sirius -- the node which did
> > not currently run the resources but which was the DC.
> > After a while, all resources on pollux were _restarted_ and
> > strange DRBD kernel messages appeared -- see attached var-log-messages from
> > pollux and sirius (I placed them in /tmp/hb_report/host/ myself).
> 
> I looked at the logs again and found out that something strange was
> happening to the DRBD master/slave instances. After I killed the
> processes on sirius, for an unknown reason the DRBD resource monitor
> on pollux returned failure (everything was running fine) and the DRBS
> resource which was previously running on sirius was migrated to
> pollux, therefore everything was stopped and started again... very
> strange!
> 
> Please see my commented logfile which contains heartbeat and drbd
> log messages:
> http://dev.gentoo.org/~wschlich/tmp/syslog.txt


daemon.warning; tengine: [24931]: WARN: update_failcount: Updating \
        failcount for drbd-r0:1 on 26cfbecf-dc25-42e1-84de-325ca9e457b5 after \
        failed monitor: rc=8

Well, rc=8 is OCF_MASTER_RUNNING. So, somehow Heartbeat did not like
the resource monitor to return OCF_MASTER_RUNNING...
What could be the reason Heartbeat expected it to be something else?
drbd-r0:1 was indeed the promoted master instance of the drbd
resource... so I cannot imagine what would be wrong here.
-- 
Regards,
Wolfram Schlich <[EMAIL PROTECTED]>
Gentoo Linux * http://dev.gentoo.org/~wschlich/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] hb_report: trouble on "simple" 2-node active/passive cluster with heartbeat 2.1.3 and CRM

Reply via email to