On 10/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Dont know where to start with this one but i have a resource that is driving 
> me mad and struggling to see the wood for the trees.
>
> The script is in our ocf directory and has all the 
> start/stop/status/monitor/meta-data etc as required. Its just a modified 
> heartbeat apache one..
>
> Anyway the problem. -2 node cluster.
>
> All starts up.. works perfect.
> Can place node into stand-by and it moves and starts and can fail back.
> but for some reason sometimes it fails to start. the gui reports Failed and 
> the status in ha.log says failed. But the start up script never gets call.
> I have checked the failcounts and this can happen when both are still 0 for 
> each node.
>
> Once running i just - do a 'dm stop' (script) and this kills the httpd.
> Then it is sometimes seen and the cluster restarts the httpd. fail count will 
> go to 1 etc.
> then if a do a stop again it may or may not work. it then fails. Count is 
> still 1 .
>
> Then if i on the same node do a 'crm_resource -U -r dm_grp -t group -H nodeb' 
> then the resource is restarted and i can see it call the start up script.
>
> Just asking for any pointers where to look so that i can attempt to stop this 
> from failing in this manner.

it depends what rc value the script is returning when it fails and
what action it was that failed.

logs and the complete cib (cibadmin -Ql) when its in this state would
help diagnose the problem.  please include both as attachments (not
inline ones)
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to