* Wolfram Schlich <[EMAIL PROTECTED]> [2008-02-14 19:27]:
> * Wolfram Schlich <[EMAIL PROTECTED]> [2008-02-14 16:22]:
> > Looked fine. Then I ran "killall -9 heartbeat ccm cib lrmd stonithd attrd
> > crmd
> > tengine pengine cibmon dopd pingd" on sirius -- the node which did
> > not currently run the resources but which was the DC.
> > After a while, all resources on pollux were _restarted_ and
> > strange DRBD kernel messages appeared -- see attached var-log-messages from
> > pollux and sirius (I placed them in /tmp/hb_report/host/ myself).
>
> I looked at the logs again and found out that something strange was
> happening to the DRBD master/slave instances. After I killed the
> processes on sirius, for an unknown reason the DRBD resource monitor
> on pollux returned failure (everything was running fine) and the DRBS
> resource which was previously running on sirius was migrated to
> pollux, therefore everything was stopped and started again... very
> strange!
>
> Please see my commented logfile which contains heartbeat and drbd
> log messages:
> http://dev.gentoo.org/~wschlich/tmp/syslog.txt
daemon.warning; tengine: [24931]: WARN: update_failcount: Updating \
failcount for drbd-r0:1 on 26cfbecf-dc25-42e1-84de-325ca9e457b5 after \
failed monitor: rc=8
Well, rc=8 is OCF_MASTER_RUNNING. So, somehow Heartbeat did not like
the resource monitor to return OCF_MASTER_RUNNING...
What could be the reason Heartbeat expected it to be something else?
drbd-r0:1 was indeed the promoted master instance of the drbd
resource... so I cannot imagine what would be wrong here.
--
Regards,
Wolfram Schlich <[EMAIL PROTECTED]>
Gentoo Linux * http://dev.gentoo.org/~wschlich/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems