On 10/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Dont know where to start with this one but i have a resource that is driving > me mad and struggling to see the wood for the trees. > > The script is in our ocf directory and has all the > start/stop/status/monitor/meta-data etc as required. Its just a modified > heartbeat apache one.. > > Anyway the problem. -2 node cluster. > > All starts up.. works perfect. > Can place node into stand-by and it moves and starts and can fail back. > but for some reason sometimes it fails to start. the gui reports Failed and > the status in ha.log says failed. But the start up script never gets call. > I have checked the failcounts and this can happen when both are still 0 for > each node. > > Once running i just - do a 'dm stop' (script) and this kills the httpd. > Then it is sometimes seen and the cluster restarts the httpd. fail count will > go to 1 etc. > then if a do a stop again it may or may not work. it then fails. Count is > still 1 . > > Then if i on the same node do a 'crm_resource -U -r dm_grp -t group -H nodeb' > then the resource is restarted and i can see it call the start up script. > > Just asking for any pointers where to look so that i can attempt to stop this > from failing in this manner.
it depends what rc value the script is returning when it fails and what action it was that failed. logs and the complete cib (cibadmin -Ql) when its in this state would help diagnose the problem. please include both as attachments (not inline ones) _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
