Hi, On Thu, Feb 21, 2008 at 02:05:32PM +0100, Andreas Kurz wrote: > On Thu, Feb 21, 2008 at 12:22 PM, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > > Hi, > > > > > > On Thu, Feb 21, 2008 at 12:19:55AM +0100, Johan Hoeke wrote: > > > LS, > > > > > > Running a 2 node cluster, heartbeat-2.1.3-3 centos rpms, RH AS 4.6 > > > > > > While testing a "maintenance scenario" for the cluster I set all > > > resources to is_managed is false, > > > > > > Feb 20 21:09:41 sierpinski pengine: [15725]: notice: native_print: > > > R_BB10PRD_DB (heartbeat::ocf:oracle): Started > > > sierpinski.uvt.nl (unmanaged) > > > > > > > > > and proceeded to shut oracle by hand, oracle being one of the resources. > > > > > > Feb 20 21:12:03 sierpinski oracle[23120]: [23145]: INFO: Oracle instance > > > BB10PRD is down > > > > > > > > > Within minutes, the node was stonithed. The log shows that this was > > > right after the monitor operation for the oracle resource came back with > > > return code 7: > > > > > > Feb 20 21:12:03 sierpinski crmd: [4584]: info: process_lrm_event: LRM > > > operation R_BB10PRD_DB_monitor_120000 (call=31, rc=7) complete > > > > > > Feb 20 21:12:03 mandelbrot stonithd: [4580]: info: > > > stonith_operate_locally::2375: sending fencing op (RESET) for > > > sierpinski.uvt.nl to device external (rsc_id=R_ilo_sierpinski:0, > > pid=5414) > > > Feb 20 21:12:03 mandelbrot stonithd: [4580]: info: Node > > > mandelbrot.uvt.nl try to help node sierpinski.uvt.nl to fence node > > > sierpinski.uvt.nl. > > > > > > Conclusion: the monitor operation was still running even though the > > > resource was unmanaged, and it forced a fencing action. > > > > Oops. So there's an on_fail=fence for this monitor operation. Is > > that necessary? > > > > > > > I then made a script which in addition to changing the resources to > > > is_managed = false also set the monitor operations to disabled=true. > > > This worked, now I am able to shutdown oracle by hand without a fencing > > > action starting up. > > > > > > Questions: > > > > > > It this expected behavior? Should monitor operations keep running even > > > though the resources are set to is_managed=false? > > > > Yes. There was some discussion about it and the majority of > > votes went this way, i.e. that monitoring should continue even > > for the unmanaged resources. > > I also agree, that it is a good idea to continue monitoring for > unmanaged resources but I would see this behaviour as a bug if the > "on_fail" action is executed if its "fence". What do you think Dejan?
I'd agree that it is a bug. IIRC, the reasoning behind continued monitoring was that CRM should always know the state of affairs. However, if one sets a resource to the unmanaged mode it is reasonable to expect that the monitor may fail and to carry out an action, any kind of action, is at least unexpected. That's why I don't think that monitoring should be done. Thanks, Dejan > Regards, > Andreas > > > > > > > > Is explicitly setting > > > the monitor operations to disable=true the "right way" to prevent > > > unwanted fencing actions during cluster maintenance? > > > > I'd say yes. But note that I was also in favour of having > > monitoring disabled by default. > > > > Thanks, > > > > Dejan > > > > > > > tia, > > > Johan > > > > > > (happy to post hb_reports if requested) > > > > > > > > > > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
